From 3a88a3e9852bb200bf37c563622e262f5513665a Mon Sep 17 00:00:00 2001 From: Wolf Vollprecht Date: Wed, 5 Feb 2025 13:14:01 +0100 Subject: [PATCH] repodata-next --- cep-repodata-next.md | 119 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 cep-repodata-next.md diff --git a/cep-repodata-next.md b/cep-repodata-next.md new file mode 100644 index 00000000..643e5bf7 --- /dev/null +++ b/cep-repodata-next.md @@ -0,0 +1,119 @@ +# CEP XX: New repodata and matchspec features + +| Title | A short title of the proposal | +| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Status | Draft | +| Author(s) | Wolf Vollprecht | +| Created | 2025-02-05T12:10:51Z | +| Updated | 2025-02-05T12:10:58Z | +| Discussion | link to the PR where the CEP is being discussed, NA is circulated initially | +| Implementation | Flags: https://github.com/conda/rattler/pull/1040, Optional: https://github.com/conda/rattler/pull/1019, Conditional: https://github.com/prefix-dev/resolvo/pull/101 | + + +The conda ecosystem finds and resolves packages based on "repodata" - the main index metadata for all artifacts in the condaverse. + +Repodata has been a stable format for a long time now. It generally consists of at least the following fields: + +```yaml +name: name of the package +version: version of the package +build_string: build string of the package +build_number: build number of the package +depends: [MatchSpec] dependencies of the package, expressed as "triplet" matchspec +constrains: [MatchSpec] constraints that the package adds to the resolution aka optional dependencies +``` + +All other fields are mostly for metadata purposes and not listed. + +With this CEP we would like to add 3 new fields to a proposed "repodata.v2" format. + +The fields serve three different purposes: + +- `extras`: optional dependency sets as known from the PyPI world. For examples, `sqlalchemy` might be a small base package that defines a number of extras such as `mysql`, `postgres`, `sqlite` that would pull in dependency sets as needed +- `conditional` dependencies, also widely known from the Python world. These are activated only when the condition is true. For example, certain dependencies such as `pywin32` are only relevant on Windows and not on macOS or Linux. +- `flags` are used to make it easier to select variants. Compiled packages can often be compiled with different options which results in different variants (for example, Debug vs. Release builds). With `flags` it will be trivial to select the preferred build with a syntax such as `foobar[flags=['release']]`. Flags are free-form and can be used by distributions such as conda-forge to differentiate between gpu and non-gpu builds as well. + +## Extra dependency sets + +We want to define a new `extras` key in `RepoData`. The key will be a dictionary mapping from String to list of MatchSpecs: + +```yaml +name: sqlalchemy +version: 1.0.0 +depends: + - python >=3.8 +extras: + sqlite: + - sqlite >=1.5 + - py-sqlite-adapter 1.0 + postgres: + - postgres >=3.5 + - pyxpgres >=8 +``` + +When a user, or a dependency, selects an extra through a MatchSpec, the extra and it's dependencies are "activated". This is conceptually the same as having three packages with "exact" dependencies from the "extra" to the base package: `sqlalchemy`, `sqlalchemy-sqlite` and `sqlalchemy-postgres` – which is the workaround currently employed by a number of packages on conda-forge. + +## Conditional dependencies + +Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`. + +The proposed syntax is: + +```yaml +name: sqlalchemy +version: 1.0.0 +depends: + - python >=3.8 + - pywin32; if __win + - six; if python <3.8 +``` + +The proposed syntax is to extend the `MatchSpec` syntax by appending `; if ` after the current MatchSpec. + +We would like to also allow for AND and OR with the following syntax: + +``` +...; if python <3.8 and numpy >=2.0 +...; if python >=3.8 or pypy +``` + +Note: the proposed functionality is already done in less elegant ways by creating multiple noarch packages with `__unix` or `__win` dependencies in the conda-forge distribution. Similarly this behavior will be conceptually similar as building multiple variants for a given package. + +## Flags for the repodata + +It's very natural to build different variants for a given package in the conda ecosystem with different properties: blas implementation, gpu / cuda version, and other variables make up the variant matrix for certain recipes. + +However, it is not easy to specify which variant a user really wants in conda today. Most of the time, some string-matching on the build string is used to select one of the options, such as `pytorch 2.5.* *cuda`. + +There are other workarounds by using `mutex` packages and constraining them such as `blas_impl * mkl` which could be used to select only packages that also depend on the MKL build. + +However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant. + +A RepodataRecord should get a new field "flags" that is a list of strings, such as: + +```yaml +name: pytorch +version: "2.5.0" +# note these flags are free-form, and distributions are free to come up +# with their own set of flags +flags: ["gpu:cuda", "blas:mkl", "archspec:4", "release"] +``` + +Flags can then be matched using the following little syntax: + +- `release`: only packages with `release` flag are used +- `~release`: disallow packages with `release` flag +- `?release`: if release flag available, filter on it, otherwise use any other +- `gpu:*`: any flag starting with `gpu:` will be matched +- `archspec:>2`: any flag starting with `archspec:` will be matched with everything trailing interpreted as a number and matched against the comparison operator +- `?archspec:>2`: if a flag starting with `archspec:` is found, match against this, otherwise ignore + +In practice, this would look like the following from a user perspective: + +```shell +conda install 'pytorch[version=">=3.1", flags=["gpu:*", "?release"]]' +``` + +## Backwards Compatibility + +The new `repodata.v2` will be served alongside the current format under `repodata.v2.json`. Older conda clients will continue using the v1 format. Packages using any of the new features will not be added to v1 of `repodata.json`. \ No newline at end of file