-
Notifications
You must be signed in to change notification settings - Fork 28
[CEP 21] Run-exports in sharded Repodata #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CEP 21] Run-exports in sharded Repodata #108
Conversation
I also opened a thread on zulip to discuss this proposal in more detail: https://conda.zulipchat.com/#narrow/channel/457607-general/topic/CEP.3A.20run_exports.20in.20sharded.20repodata if anyone is interested. 😄 |
Thanks Bas! 🙏 Would this allow us to repodata patch |
Yes! the last paragraph in the CEP is about that. |
We probably need to specify something about tools that don't use the repodata shards should deal with run export patches. Otherwise, we might have different tools seeing different run exports for a patched package. |
Thats a good point, do you have something in mind? |
Not right now. For channels with a separate run exports JSON blob, that blob can be patched. Channels without this extra file, there is no real way to do this except to have the user download the patches and apply them on the fly. :/ |
I added some extra text to the CEP to explain how run export patches should be handled. Does that make things clearer @beckermr ? |
I am still confused. The new text covers patching run exports for non-shareded repodata. For sharded repodata, are the run export patches pulled from "run_exports_patch_instructions.json" as well? |
Co-authored-by: Matthew R. Becker <beckermr@users.noreply.github.com>
cep-0016-2.md
Outdated
We propose to add a `run_export` field to each record that mimics the specification from CEP-12. | ||
|
||
If the `run_export` field is not present in the record it means no `run_export` information is stored with the record, and a fallback mechanism should be used to acquire the run-export information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replicating the run_export
field in the repodata would result in a lot of "run_exports": {}
entries. These will compress well but would it be more efficient to store a top level field that indicates that records have an empty run_exports
unless they are explicitly declared? This would add complexity at the benefit of smaller(?) repodata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think that for shards, using msgpack + zst
, it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus the shards have much improved caching behavior (content-addressed) so that total download will always be much much lower vs. the current situation.
Use `patch_instructions_version: 2`
We (@baszalmstra, @wolfv) would like to open the vote on this CEP. Voting period: 2 weeks. Vote ends on Friday, March 14th 2025. Please use the check box to cast your vote. Votes@xhochy (Uwe Korn)
@CJ-Wright (Christopher J. 'CJ' Wright)
@mariusvniekerk (Marius van Niekerk)
@chenghlee (Cheng H. Lee)
@ocefpaf (Filipe Fernandes)
@marcelotrevisani (Marcelo Duarte Trevisani)
@msarahan (Michael Sarahan)
@mbargull (Marcel Bargull)
@jakirkham (John Kirkham)
@jezdez (Jannis Leidel)
@wolfv (Wolf Vollprecht)
@jaimergp (Jaime Rodríguez-Guerra)
@baszalmstra (Bas Zalmstra)
@beckermr (Matthew R. Becker)
@Hind-M (Hind Montassif)
@trallard (Tania Allard)
|
It's too late now, but future specs really should separate the rationale from the spec itself. |
Not sure about the patching section. I've been thinking about sharded repodata as the primary form of repodata, in which case there would not be a separate
|
@dholth, the patching run_exports is there for tooling that uses the kind of repodata anaconda.org currently outputs. It says that if a tool encounters the v2 package patching instructions and that tool is producing run_exports.json, then that tool must patch the run exports. The spec does not say that one has to produce run_esports.json no matter what. |
🎉 This CEP was accepted with: Total voters: 16 (valid: 13 = 81.25%) Yes votes (13 / 100.00%):
No votes (0 / 0.00%)): Abstain votes (0 / 0.00%): Not voted (3):
Invalid votes (0): |
I've turned on required status check for this repo. Please do not merge if the linter is complaining. |
We've missed
Not all of those are caught by the linter, but the first item is for sure. |
sorry about that. For a long time the CI was always red on this repo so I ignored it. |
Added #120 with necessary fixes |
We propose the store
run_exports
in sharded repodata shards.Given that a lot of technical limitations have been mitigated with sharded repodata we propose to also store
run_exports
in the shards. This allows build tools to acquire run export information more easily and faster.📝 Rendered
The sharded repodata served by
https://prefix.dev
already implements this behavior. Rattler-build can also use the run_exports directly from repodata which speeds up resolution during setup as no extra steps are required to determine the run_exports.