8000 [CEP 21] Run-exports in sharded Repodata by baszalmstra · Pull Request #108 · conda/ceps · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[CEP 21] Run-exports in sharded Repodata #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 20, 2025

Conversation

baszalmstra
Copy link
Contributor

We propose the store run_exports in sharded repodata shards.

Given that a lot of technical limitations have been mitigated with sharded repodata we propose to also store run_exports in the shards. This allows build tools to acquire run export information more easily and faster.

📝 Rendered

The sharded repodata served by https://prefix.dev already implements this behavior. Rattler-build can also use the run_exports directly from repodata which speeds up resolution during setup as no extra steps are required to determine the run_exports.

@baszalmstra
Copy link
Contributor Author

I also opened a thread on zulip to discuss this proposal in more detail: https://conda.zulipchat.com/#narrow/channel/457607-general/topic/CEP.3A.20run_exports.20in.20sharded.20repodata if anyone is interested. 😄

@baszalmstra
Copy link
Contributor Author
baszalmstra commented Jan 16, 2025

Related to #87 and #88

@jakirkham
Copy link
Member

Thanks Bas! 🙏

Would this allow us to repodata patch run_exports too? 😉

@baszalmstra
Copy link
Contributor Author

Would this allow us to repodata patch run_exports too?

Yes! the last paragraph in the CEP is about that.

@beckermr
Copy link
Contributor

We probably need to specify something about tools that don't use the repodata shards should deal with run export patches. Otherwise, we might have different tools seeing different run exports for a patched package.

@baszalmstra
Copy link
Contributor Author

We probably need to specify something about tools that don't use the repodata shards should deal with run export patches. Otherwise, we might have different tools seeing different run exports for a patched package.

Thats a good point, do you have something in mind?

@beckermr
Copy link
Contributor

Not right now. For channels with a separate run exports JSON blob, that blob can be patched. Channels without this extra file, there is no real way to do this except to have the user download the patches and apply them on the fly. :/

@baszalmstra
Copy link
Contributor Author

I added some extra text to the CEP to explain how run export patches should be handled. Does that make things clearer @beckermr ?

@beckermr
Copy link
Contributor
beckermr commented Feb 6, 2025

I am still confused. The new text covers patching run exports for non-shareded repodata. For sharded repodata, are the run export patches pulled from "run_exports_patch_instructions.json" as well?

Co-authored-by: Matthew R. Becker <beckermr@users.noreply.github.com>
cep-0016-2.md Outdated
Comment on lines 37 to 39
We propose to add a `run_export` field to each record that mimics the specification from CEP-12.

If the `run_export` field is not present in the record it means no `run_export` information is stored with the record, and a fallback mechanism should be used to acquire the run-export information.
Copy link
Contributor
@jjhelmus jjhelmus Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replicating the run_export field in the repodata would result in a lot of "run_exports": {} entries. These will compress well but would it be more efficient to store a top level field that indicates that records have an empty run_exports unless they are explicitly declared? This would add complexity at the benefit of smaller(?) repodata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think that for shards, using msgpack + zst, it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus the shards have much improved caching behavior (content-addressed) so that total download will always be much much lower vs. the current situation.

Use `patch_instructions_version: 2`
@wolfv
Copy link
Contributor
wolfv commented Feb 28, 2025

We (@baszalmstra, @wolfv) would like to open the vote on this CEP.

Voting period: 2 weeks. Vote ends on Friday, March 14th 2025.

Please use the check box to cast your vote.

Votes

@xhochy (Uwe Korn)

  • yes
  • no
  • abstain

@CJ-Wright (Christopher J. 'CJ' Wright)

  • yes
  • no
  • abstain

@mariusvniekerk (Marius van Niekerk)

  • yes
  • no
  • abstain

@chenghlee (Cheng H. Lee)

  • yes
  • no
  • abstain

@ocefpaf (Filipe Fernandes)

  • yes
  • no
  • abstain

@marcelotrevisani (Marcelo Duarte Trevisani)

  • yes
  • no
  • abstain

@msarahan (Michael Sarahan)

  • yes
  • no
  • abstain

@mbargull (Marcel Bargull)

  • yes
  • no
  • abstain

@jakirkham (John Kirkham)

  • yes
  • no
  • abstain

@jezdez (Jannis Leidel)

  • yes
  • no
  • abstain

@wolfv (Wolf Vollprecht)

  • yes
  • no
  • abstain

@jaimergp (Jaime Rodríguez-Guerra)

  • yes
  • no
  • abstain

@baszalmstra (Bas Zalmstra)

  • yes
  • no
  • abstain

@beckermr (Matthew R. Becker)

  • yes
  • no
  • abstain

@Hind-M (Hind Montassif)

  • yes
  • no
  • abstain

@trallard (Tania Allard)

  • yes
  • no
  • abstain

@beckermr
Copy link
Contributor

It's too late now, but future specs really should separate the rationale from the spec itself.

@dholth
Copy link
Contributor
dholth commented Feb 28, 2025

Not sure about the patching section. I've been thinking about sharded repodata as the primary form of repodata, in which case there would not be a separate run_exports.json unless you pass a flag to generate the legacy format.

With the run-exports part of the repodata we propose to also allow repodata patching these fields. The original run-exports can still be extracted from the packages. We do not see a reason how the run-export information is different from other patchable information stored in the repodata (like the dependencies).

To facilitate these patches, the patch_instructions_version in patch_instructions.json files is incremented to 2. A patch_instructions_version: 2 file CAN contain a run_exports: ... field in the patch instructions that MUST be used to patch generated run_exports.json files.

@beckermr
Copy link
Contributor

@dholth, the patching run_exports is there for tooling that uses the kind of repodata anaconda.org currently outputs. It says that if a tool encounters the v2 package patching instructions and that tool is producing run_exports.json, then that tool must patch the run exports.

The spec does not say that one has to produce run_esports.json no matter what.

@wolfv
Copy link
Contributor
wolfv commented Mar 20, 2025

🎉 This CEP was accepted with:

Total voters: 16 (valid: 13 = 81.25%)

Yes votes (13 / 100.00%):

No votes (0 / 0.00%)):

Abstain votes (0 / 0.00%):

Not voted (3):

Invalid votes (0):

@wolfv wolfv merged commit 157704d into conda:main Mar 20, 2025
1 check failed
@beckermr
Copy link
Contributor

I've turned on required status check for this repo. Please do not merge if the linter is complaining.

@beckermr
Copy link
Contributor

We've missed

  • change in CEP title / correct title formatting
  • adding the CEP to the readme
  • adding the resolution section describing the vote

Not all of those are caught by the linter, but the first item is for sure.

@wolfv
Copy link
Contributor
wolfv commented Mar 20, 2025

sorry about that. For a long time the CI was always red on this repo so I ignored it.

@jaimergp
Copy link
Contributor

Added #120 with necessary fixes

@jaimergp jaimergp changed the title CEP: run_exports in sharded repodata [CEP 21] Run-exports in sharded Repodata Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0