8000 Editorial changes to CEP 21 by jaimergp · Pull Request #120 · conda/ceps · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Editorial changes to CEP 21 #120

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ for conda's implementation, all major changes should be submitted as
| [0018](cep-0018.md) | Migration to the Zulip chat platform |
| [0019](cep-0019.md) | Computing the hash of the contents in a directory |
| [0020](cep-0020.md) | Support for `abi3` Python packages |
| [0021](cep-0021.md) | Run-exports in sharded Repodata |

## References

Expand Down
16 changes: 9 additions & 7 deletions cep-0021.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# CEP 21 - Run-exports in sharded Repodata

<table>
<tr><td> Title </td><td> Run-exports in sharded Repodata. </td>
<tr><td> Title </td><td> CEP 21 - Run-exports in sharded Repodata </td></tr>
<tr><td> Status </td><td> Approved </td></tr>
<tr><td> Author(s) </td><td> Bas Zalmstra &lt;bas@prefix.dev&gt;</td></tr>
<tr><td> Created </td><td> Jan 16, 2025</td></tr>
Expand All @@ -8,24 +10,24 @@
<tr><td> Implementation </td><td> NA </td></tr>
</table>

# Run-exports in sharded Repodata
## Abstract

We propose to add run-export information to the sharded repodata shards.

## Motivation

When building conda packages the build infrastructure needs to extract run-export information from conda packages in the host- and build environments.
Run-export information is stored in a package and can be extracted by downloading the package and extracting the `run_exports.json` file.
Run-export information is stored in a package and can be extracted by downloading the package and extracting the `run_exports.json` file.
Even with the possibility to stream parts of `.conda` files this is a relatively resource-intensive operation.

[CEP-12](https://github.com/conda/ceps/blob/main/cep-0012.md) formalized a `run_exports.json` file that is stored next `repodata.json` file.
[CEP 12](https://github.com/conda/ceps/blob/main/cep-0012.md) formalized a `run_exports.json` file that is stored next `repodata.json` file.
However, not all channels on the default server (conda.anaconda.org) provide this information which means falling back to downloading and extracting this information from the packages. It is possible to extract the data by only sparsly reading the file but the overhead is still relatively large.

Having two separate files also poses some problems as extra mechanisms have to be introduced in the build infrastructure to manage and sync both files on the build machines.

## Specification

CEP-12 mentions the following reasons for splitting the information into two files:
CEP 12 mentions the following reasons for splitting the information into two files:

> * It would require extending the repodata schema, currently not formally standardized.
> * It would increase the size of the already heavy repodata.json files.
Expand All @@ -34,7 +36,7 @@ We propose that these reasons no longer hold with [sharded repodata](https://git

**It would require extending the repodata schema, currently not formally standardized.**

We propose to add a `run_export` field to each record that mimics the specification from CEP-12.
We propose to add a `run_export` field to each record that mimics the specification from CEP 12.

If the `run_export` field is not present in the record it means no `run_export` information is stored with the record, and a fallback mechanism should be used to acquire the run-export information.

Expand All @@ -55,7 +57,7 @@ Let's take a look at the current sizes of run_exports.json and repodata.json fil

Since the repodata shards are also compressed we can conclude that in practice adding run exports information would increase the size of the repodata shards by roughly 5-6%.

With the introduction of sharded repodata in [CEP-16](https://github.com/conda/ceps/blob/main/cep-0016.md) the issues with size (and scale) have been effectively mitigated. Adding 5-6% to the total size of the shards will not pose a risk since all advantages of sharded repodata mentioned in the CEP still hold.
With the introduction of sharded repodata in [CEP 16](https://github.com/conda/ceps/blob/main/cep-0016.md) the issues with size (and scale) have been effectively mitigated. Adding 5-6% to the total size of the shards will not pose a risk since all advantages of sharded repodata mentioned in the CEP still hold.

**(Typed) repodata parsers would need to be updated to handle the new field.**

Expand Down
0