8000 [pypi] Fix cleanup logic for multiple branches by hmeriann · Pull Request #16634 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[pypi] Fix cleanup logic for multiple branches #16634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hmeriann
Copy link
Contributor
@hmeriann hmeriann commented Mar 13, 2025

Python Twine upload keeps failing nightly:

ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/        
         Project size too large. Limit for project 'duckdb' total size is 30 GB.
         See https://pypi.org/help/#project-size-limit    

The script we use to cleanup the space on pypi was designed for single branch, but now we upload from two branches. We need to delete old dev builds from both of them, but really do that only for dev builds created from main branch.

@carlopi suggested a cleanup logic to fix it and helped me to start up with implementation.

The fix is to update the cleanup process to run separately for each branch.
The logic is to shift from cleaning files older than X days on the main branch to cleaning files older than X/2 days on both the main and v1.2- branches.
This PR changes number of days to retain dev releases to 5 (was 10, but now it's 5 per both main and v1.2-), adds DUCKDB_BASE_VERSION env variable to filter out the pkg versions which we don't want to keep on pypi.

To test it run: PYPI_CLEANUP_USERNAME=user PYPI_CLEANUP_PASSWORD=psw PYPI_CLEANUP_OTP=otp DUCKDB_BASE_VERSION=<major.minor> python3 scripts/pypi_cleanup.py replacing <major.minor> with duckdb's base version (actual versions for now are 1.2 and 1.3 for main).

Should exit if DUCKDB_BASE_VERSION is empty, shorter than 3 symbols, contains more or less than 1 dots.
When the version is set to one of actual versions, expected that the value of pkg_vers variable contains the list of the dev releases names to delete.

@hmeriann
Copy link
Contributor Author

@Mytherin could you please delete some of the latest PRE-RELEASE versions from pypi? Otherwise it fails again, because there is no space for new uploads.

@hmeriann hmeriann requested a review from carlopi March 13, 2025 16:35
@hmeriann hmeriann changed the title 4394 invokeci python twine upload had failed on main [pypi] Fix cleanup logic for multiple branches Mar 13, 2025
Copy link
Contributor
@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

I think this is OK for now, and worth getting sorted out.

We might need to generalise somehow, probably a variable, a JSON string holding the mapping between tested branches AND targeted versions.
That would be a cool +1 to clean this up, but maybe first worth testing.

name: PyPi Release Cleanup
needs: twine-upload
runs-on: ubuntu-22.04
env:
PYPI_CLEANUP_USERNAME: 'mytherin'
PYPI_CLEANUP_PASSWORD: ${{secrets.PYPI_CLEANUP_PASSWORD}}
PYPI_CLEANUP_OTP: ${{secrets.PYPI_CLEANUP_OTP}}
DUCKDB_BASE_VERSION: ${{github.ref == 'refs/heads/main' && '1.3' || '1.2'}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is the weirder one, but I am not sure if / what is a more proper alternative. This would need to be revised / changed moving forward. I don't have a proper answer, possibly it's fine and to be generalised later (together with removing the hardcoded branch name), first checking this work as intended.

@carlopi
Copy link
Contributor
carlopi commented Mar 13, 2025

Actually, could you make this PR (only the 3 relevant commits) vs branch v1.2-histrionicus ?
That way we avoid duplication / workflow diverging.

@hmeriann hmeriann changed the base branch from main to v1.2-histrionicus March 13, 2025 17:16
@hmeriann hmeriann changed the base branch from v1.2-histrionicus to main March 13, 2025 17:16
@hmeriann hmeriann changed the base branch from main to v1.2-histrionicus March 13, 2025 17:24
@hmeriann hmeriann changed the base branch from v1.2-histrionicus to main March 13, 2025 17:25
…lean up builds created by both (release and main) branches
add check that duckdb version is set and has correct length;
filter pkg_vers using a pattern containing a duckdb_version_base
@hmeriann hmeriann force-pushed the 4394-invokeci-python-twine-upload-had-failed-on-main branch from c190329 to bf8ff3c Compare March 14, 2025 08:21
@hmeriann hmeriann changed the base branch from main to v1.2-histrionicus March 14, 2025 08:22
@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 14, 2025 08:44
@hmeriann hmeriann marked this pull request as ready for review March 14, 2025 08:46
@Mytherin
Copy link
Collaborator
Mytherin commented Mar 14, 2025

Thanks for the PR!

This looks a bit brittle/error prone to me. I would suggest simplifying this and - instead of having the script behave differently depending on where it is run, we can make the script itself a bit more clever in which releases are deleted.

I would propose to modify the script to keep the last 3 dev releases of the next major version (1.3.X) and the last 3 dev releases of the next minor version (1.2.2-X). We can know what these are by looking at the latest non-dev release (currently v1.2.1). We can then also delete all dev releases of previous versions (i.e. when we release 1.2.1, we don't need to keep v1.2.1-dev... releases).

This moves the behavior from relying on "retention days" - which should also prevent us from deleting all releases when builds fail for a few days in a row, and further reduce brittleness here.

@hmeriann hmeriann requested review from carlopi and Mytherin March 17, 2025 14:52
@hmeriann
Copy link
Contributor Author

@Mytherin now the retention days are replaced with the number of versions to keep (3), so we don't need to pass the DUCKDB_BASE_VERSION variable to the script. Also, when a new version've been released, the script should remove all of its PRE_RELEASE versions.

Copy link
Collaborator
@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Looks good - I tried it locally and it mostly works. Some comments:

patterns = [re.compile(r".*\.dev\d+$")]
# how many dev versions to keep - 3 for each not yet released version
how_many_dev_versions_to_keep = 3
patterns = [re.compile(rf"^{duckdb_version_base}.*\.dev\d+$")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{duckdb_version_base} shouldn't be here anymore right?

@Mytherin
Copy link
Collaborator

For testing, maybe we can expose the --dry parameter to this script. We don't need to login when not actually deleting, so that could then be run locally as well.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft March 17, 2025 15:56
when it passed to the script, it will output the list of the PRE-RELEASE versions to delete and return
@hmeriann hmeriann marked this pull request as ready for review March 18, 2025 10:47
@hmeriann
Copy link
Contributor Author
hmeriann commented Mar 18, 2025

@Mytherin thank you for the review and the idea to add the --dry parameter for testing.
I've updated the pr so now it should output the list of names to delete and return, when --dry is passed to the script. Added the comment about that as well

@Mytherin Mytherin changed the base branch from v1.2-histrionicus to main March 18, 2025 12:11
@Mytherin Mytherin merged commit 9fab868 into duckdb:main Mar 18, 2025
@Mytherin
Copy link
Collaborator

Thanks! LGTM now. I've merged it into main since you rebased it off of main (perhaps accidentally?)

@carlopi
Copy link
Contributor
carlopi commented Mar 18, 2025

I think either is good

Nice the more solid rework, it should also make possible to just run this every day, without actually checking if upload went through, but that's on a side.

@hmeriann
Copy link
Contributor Author

Thanks! LGTM now. I've merged it into main since you rebased it off of main (perhaps accidentally?)

Thanks! Yes, it was accidentally 🙈 I was going to update a branch, but not this one

@hmeriann
Copy link
Contributor Author
hmeriann commented Mar 18, 2025

I think either is good

Thank you for your review and help!

Mytherin added a commit that referenced this pull request Mar 27, 2025
… on PyPi the most recent dev versions (#16873)

In the PR #16634 I created a bug
which created a problem, when last nightly branch `v1.2-histironicus`
uploaded `1.2.2.dev132`, and then got deleted by
`scripts/pypi_cleanup.py` triggered by the main run ([run
link](https://github.com/duckdb/duckdb/actions/runs/14096235756/job/39498278728#step:5:18))
(this was raised by Boaz at MD)

The issue appeared because of the string sort, when `1.2.2.dev128` is
smaller than `1.2.2.dev96` since 1 < 9.

This PR adds a lambda function to sort by dev version number casted to
int.
Tested with decreased number of versions to keep and
1. append '1.2.2.dev132' value to list before sorting:
```
defaultdict(<class 'list'>, {'1.2.2': ['1.2.2.dev83', '1.2.2.dev96', '1.2.2.dev132'], '1.3.0': ['1.3.0.dev1734', '1.3.0.dev1738', '1.3.0.dev1793', '1.3.0.dev1894']})
Following pkg_vers can be deleted:  ['1.2.2.dev83', '1.3.0.dev1734', '1.3.0.dev1738']
```
2. insert '1.2.2.dev132' value to the beginning of the list before
sorting:
```
defaultdict(<class 'list'>, {'1.2.2': ['1.2.2.dev132', '1.2.2.dev83', '1.2.2.dev96'], '1.3.0': ['1.3.0.dev1734', '1.3.0.dev1738', '1.3.0.dev1793', '1.3.0.dev1894']})
Following pkg_vers can be deleted:  ['1.2.2.dev83', '1.3.0.dev1734', '1.3.0.dev1738']
```
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714)
[pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714)
[pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714)
[pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 16, 2025
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714)
[pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714)
[pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0