8000 CEP for the next evolution of Repodata (v2) and MatchSpec by wolfv · Pull Request #111 · conda/ceps · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CEP for the next evolution of Repodata (v2) and MatchSpec #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions cep-repodata-next.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# CEP XX: New repodata and matchspec features

| Title | A short title of the proposal |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Status | Draft |
| Author(s) | Wolf Vollprecht <w.vollprecht@gmail.com> |
| Created | 2025-02-05T12:10:51Z |
| Updated | 2025-02-05T12:10:58Z |
| Discussion | link to the PR where the CEP is being discussed, NA is circulated initially |
| Implementation | Flags: https://github.com/conda/rattler/pull/1040, Optional: https://github.com/conda/rattler/pull/1019, Conditional: https://github.com/prefix-dev/resolvo/pull/101 |


The conda ecosystem finds and resolves packages based on "repodata" - the main index metadata for all artifacts in the condaverse.

Repodata has been a stable format for a long time now. It generally consists of at least the following fields:

```yaml
name: name of the package
version: version of the package
build_string: build string of the package
build_number: build number of the package
depends: [MatchSpec] dependencies of the package, expressed as "triplet" matchspec
constrains: [MatchSpec] constraints that the package adds to the resolution aka optional dependencies
```

All other fields are mostly for metadata purposes and not listed.

With this CEP we would like to add 3 new fields to a proposed "repodata.v2" format.

The fields serve three different purposes:

- `extras`: optional dependency sets as known from the PyPI world. For examples, `sqlalchemy` might be a small base package that defines a number of extras such as `mysql`, `postgres`, `sqlite` that would pull in dependency sets as needed
- `conditional` dependencies, also widely known from the Python world. These are activated only when the condition is true. For example, certain dependencies such as `pywin32` are only relevant on Windows and not on macOS or Linux.
- `flags` are used to make it easier to select variants. Compiled packages can often be compiled with different options which results in different variants (for example, Debug vs. Release builds). With `flags` it will be trivial to select the preferred build with a syntax such as `foobar[flags=['release']]`. Flags are free-form and can be used by distributions such as conda-forge to differentiate between gpu and non-gpu builds as well.

## Extra dependency sets

We want to define a new `extras` key in `RepoData`. The key will be a dictionary mapping from String to list of MatchSpecs:

```yaml
name: sqlalchemy
version: 1.0.0
depends:
- python >=3.8
extras:
sqlite:
- sqlite >=1.5
- py-sqlite-adapter 1.0
postgres:
- postgres >=3.5
- pyxpgres >=8
```

When a user, or a dependency, selects an extra through a MatchSpec, the extra and it's dependencies are "activated". This is conceptually the same as having three packages with "exact" dependencies from the "extra" to the base package: `sqlalchemy`, `sqlalchemy-sqlite` and `sqlalchemy-postgres` – which is the workaround currently employed by a number of packages on conda-forge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clarify what happens if an extra is requested for a package but the selected variant doesnt provide that extra. E.g. what happens if I depend on a foobar[extras=["doesntexist"]]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it but its also not defined how to depend on an extra?


## Conditional dependencies

Conditional dependencies are activated when the condition is true. The most straight-forward conditions are `__unix`, `__win` and other platform specifiers. However, we would also like to support matchspecs in conditions such as `python >=3`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much like this idea. An implementation note: while if __unix is reasonably easy to implement because it is "static", if python <3.8 is conceptually much harder as it is not something that can be decided ahead of solving. I requires to be able to adapt the dependencies of a package as partial candidates are investigated during solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a form of boolean dependencies in rpm (already supported by libsolv I believe)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR that implements this for resolvo: prefix-dev/resolvo#136


The proposed syntax is:

```yaml
name: sqlalchemy
version: 1.0.0
depends:
- python >=3.8
- pywin32; if __win
- six; if python <3.8
```

The proposed syntax is to extend the `MatchSpec` syntax by appending `; if <CONDITION>` after the current MatchSpec.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we didn't need the ; because somehow conda will happily ignore the if parts while parsing MatchSpecs.

>>> from conda.models.match_spec import MatchSpec as M
>>> M("python 3.8 * if python")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 'if' if  __win")  # quote 'if' to force parse it as a build string
MatchSpec("python==3.8='if'")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also ignore parenthesised blocks:

>>> M("python 3.8 * (__win)")
MatchSpec("python==3.8[build=*]")
>>> M("python 3.8 (__win)")
MatchSpec("python==3.8")
>>> M("python 3.8 (__win and __osx)")
MatchSpec("python==3.8")
>>> M("python 3.8 (if __win and __osx)")
MatchSpec("python==3.8")

Copy link
Contributor
@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libmamba also ignores parentheses:

>>> from libmambapy.specs import MatchSpec as LibmambaMatchSpec
>>>print(LibmambaMatchSpec.parse("python 3.8 * (__win and __osx)"))
python==3.8
>>> print(LibmambaMatchSpec.parse("python 3.8 * (if __win and __osx)"))
python==3.8

Copy link
Contributor
@jaimergp jaimergp Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to design the syntax like this:

name [version [build]] ('if' condition)

The if literal could be omitted, or replaced with with, if folks feel it's clearer that way. See discussion in #conda-maintainers > Conditional dependencies syntax in v2 environments & recipes @ 💬.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea (not sure if a good one) could be: rather than make MatchSpecs more complex than they already are, build on the idea of selectors from the new recipe format and allow depends to contain objects like so:

depends:
 - python >=3.8
 - if: python <3.8
    then: six

In the recipe format, this is expressed via a variation on the selector to avoid being process at build time (if(run): for instance).

This disallow basically disallow conditionals outside of recipes (or format with selector), but makes a more consistent narrative around conditionals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was indeed also posted on zulip. One issue is that it then becomes harder for cli tools to adopt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we use the when syntax without a semicolon to force an error on older versions of conda that dont support it and the distinguish between the already established if syntax in recipe v1. We can use the same keyword in a more expanded form as the "build spec"

foobar when python >=3.8

E.g. in a recipe:

- if: unix
  then: foobar
  when: python >=3.8
  
# OR
  
- when: python >=3.8
  then: foobar
 
# OR 
  
- "foobar when python >=3.8"


We would like to also allow for AND and OR with the following syntax:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need NOT and parentheses for precedence overrides, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regular MatchSepc, we have , and | used for versions for and and or. I think we should keep thing similar, even if it means supporting and and or in versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If find this hard to distinguish when parsing the version. E.g.

python <3.8|>3.9 | numpy >=2.0
python <3.8|>3.9 or numpy >=2.0

I find the version with or easier to read than the one with pipes.


```
...; if python <3.8 and numpy >=2.0
...; if python >=3.8 or pypy
```

Note: the proposed functionality is already done in less elegant ways by creating multiple noarch packages with `__unix` or `__win` dependencies in the conda-forge distribution. Similarly this behavior will be conceptually similar as building multiple variants for a given package.
2E18
## Flags for the repodata

It's very natural to build different variants for a given package in the conda ecosystem with different properties: blas implementation, gpu / cuda version, and other variables make up the variant matrix for certain recipes.

However, it is not easy to specify which variant a user really wants in conda today. Most of the time, some string-matching on the build string is used to select one of the options, such as `pytorch 2.5.* *cuda`.

There are other workarounds by using `mutex` packages and constraining them such as `blas_impl * mkl` which could be used to select only packages that also depend on the MKL build.

However, it would be nice if we could have a flexible, powerful and simple syntax to enable or disable "flags" on packages in order to select a variant.

A RepodataRecord should get a new field "flags" that is a list of strings, such as:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do flags and extra mix and overlap? Wouldn't conditional dependencies on a flag be enough to generate the extra category?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flags are part of a variant. So there is no variation of flags for a single variant. E.g. flags could be used to say a particular variant is using cuda and another is used to target cpu. Extras are a way to select additional dependencies for a particular variant. If a variant also adds a CLI tool it provides the extra "cli". Only if that extra is requested by another package are particular dependencies also requested.

In technical terms extras can indeed be implemented as conditional dependencies. E.g. for a package my_package we could express it as typer when my_package[cli]. If there is a package that depends on my_package[cli] typer would also be required.


```yaml
name: pytorch
version: "2.5.0"
# note these flags are free-form, and distributions are free to come up
# with their own set of flags
flags: ["gpu:cuda", "blas:mkl", "archspec:4", "release"]
```

Flags can then be matched using the following little syntax:

- `release`: only packages with `release` flag are used
- `~release`: disallow packages with `release` flag
- `?release`: if release flag available, filter on it, otherwise use any other
- `gpu:*`: any flag starting with `gpu:` will be matched

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example for the string matching for say blas:mkl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an exact match not work fine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would, we should just state how to do an exact match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You coudl just ask for flags = ["blas:mkl"] for an exact match :)

- `archspec:>2`: any flag starting with `archspec:` will be matched with everything trailing interpreted as a number and matched against the comparison operator
- `?archspec:>2`: if a flag starting with `archspec:` is found, match against this, otherwise ignore

In practice, this would look like the following from a user perspective:

```shell
conda install 'pytorch[version=">=3.1", flags=["gpu:*", "?release"]]'
```

## Backwards Compatibility

The new `repodata.v2` will be served alongside the current format under `repodata.v2.json`. Older conda clients will continue using the v1 format. Packages using any of the new features will not be added to v1 of `repodata.json`.
Loading
0