-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[pypi] Fix cleanup logic for multiple branches #16634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pypi] Fix cleanup logic for multiple branches #16634
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
I think this is OK for now, and worth getting sorted out.
We might need to generalise somehow, probably a variable, a JSON string holding the mapping between tested branches AND targeted versions.
That would be a cool +1 to clean this up, but maybe first worth testing.
.github/workflows/Python.yml
Outdated
name: PyPi Release Cleanup | ||
needs: twine-upload | ||
runs-on: ubuntu-22.04 | ||
env: | ||
PYPI_CLEANUP_USERNAME: 'mytherin' | ||
PYPI_CLEANUP_PASSWORD: ${{secrets.PYPI_CLEANUP_PASSWORD}} | ||
PYPI_CLEANUP_OTP: ${{secrets.PYPI_CLEANUP_OTP}} | ||
DUCKDB_BASE_VERSION: ${{github.ref == 'refs/heads/main' && '1.3' || '1.2'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the weirder one, but I am not sure if / what is a more proper alternative. This would need to be revised / changed moving forward. I don't have a proper answer, possibly it's fine and to be generalised later (together with removing the hardcoded branch name), first checking this work as intended.
Actually, could you make this PR (only the 3 relevant commits) vs branch |
…lean up builds created by both (release and main) branches
add check that duckdb version is set and has correct length; filter pkg_vers using a pattern containing a duckdb_version_base
c190329
to
bf8ff3c
Compare
Thanks for the PR! This looks a bit brittle/error prone to me. I would suggest simplifying this and - instead of having the script behave differently depending on where it is run, we can make the script itself a bit more clever in which releases are deleted. I would propose to modify the script to keep the last 3 dev releases of the next major version ( This moves the behavior from relying on "retention days" - which should also prevent us from deleting all releases when builds fail for a few days in a row, and further reduce brittleness here. |
when there is a release, remove all its pre-release versions
… cleanup script anymore
@Mytherin now the retention days are replaced with the number of versions to keep (3), so we don't need to pass the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Looks good - I tried it locally and it mostly works. Some comments:
scripts/pypi_cleanup.py
Outdated
patterns = [re.compile(r".*\.dev\d+$")] | ||
# how many dev versions to keep - 3 for each not yet released version | ||
how_many_dev_versions_to_keep = 3 | ||
patterns = [re.compile(rf"^{duckdb_version_base}.*\.dev\d+$")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{duckdb_version_base}
shouldn't be here anymore right?
For testing, maybe we can expose the |
when it passed to the script, it will output the list of the PRE-RELEASE versions to delete and return
@Mytherin thank you for the review and the idea to add the |
…n-twine-upload-had-failed-on-main
Thanks! LGTM now. I've merged it into |
I think either is good Nice the more solid rework, it should also make possible to just run this every day, without actually checking if upload went through, but that's on a side. |
Thanks! Yes, it was accidentally 🙈 I was going to update a branch, but not this one |
Thank you for your review and help! |
… on PyPi the most recent dev versions (#16873) In the PR #16634 I created a bug which created a problem, when last nightly branch `v1.2-histironicus` uploaded `1.2.2.dev132`, and then got deleted by `scripts/pypi_cleanup.py` triggered by the main run ([run link](https://github.com/duckdb/duckdb/actions/runs/14096235756/job/39498278728#step:5:18)) (this was raised by Boaz at MD) The issue appeared because of the string sort, when `1.2.2.dev128` is smaller than `1.2.2.dev96` since 1 < 9. This PR adds a lambda function to sort by dev version number casted to int. Tested with decreased number of versions to keep and 1. append '1.2.2.dev132' value to list before sorting: ``` defaultdict(<class 'list'>, {'1.2.2': ['1.2.2.dev83', '1.2.2.dev96', '1.2.2.dev132'], '1.3.0': ['1.3.0.dev1734', '1.3.0.dev1738', '1.3.0.dev1793', '1.3.0.dev1894']}) Following pkg_vers can be deleted: ['1.2.2.dev83', '1.3.0.dev1734', '1.3.0.dev1738'] ``` 2. insert '1.2.2.dev132' value to the beginning of the list before sorting: ``` defaultdict(<class 'list'>, {'1.2.2': ['1.2.2.dev132', '1.2.2.dev83', '1.2.2.dev96'], '1.3.0': ['1.3.0.dev1734', '1.3.0.dev1738', '1.3.0.dev1793', '1.3.0.dev1894']}) Following pkg_vers can be deleted: ['1.2.2.dev83', '1.3.0.dev1734', '1.3.0.dev1738'] ```
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714) [pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714) [pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714) [pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714) [pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Add support for `ALTER TABLE tbl SET SORTED BY (key1 DESC, key2, ...)` in the grammar (duckdb/duckdb#16714) [pypi] Fix cleanup logic for multiple branches (duckdb/duckdb#16634)
Python Twine upload keeps failing nightly:
The script we use to cleanup the space on pypi was designed for single branch, but now we upload from two branches. We need to delete old dev builds from both of them, but really do that only for dev builds created from
main
branch.@carlopi suggested a cleanup logic to fix it and helped me to start up with implementation.
The fix is to update the cleanup process to run separately for each branch.
The logic is to shift from cleaning files older than X days on the
main
branch to cleaning files older than X/2 days on both themain
andv1.2-
branches.This PR changes number of days to retain dev releases to 5 (was 10, but now it's 5 per both main and v1.2-), adds
DUCKDB_BASE_VERSION
env variable to filter out the pkg versions which we don't want to keep on pypi.To test it run:
PYPI_CLEANUP_USERNAME=user PYPI_CLEANUP_PASSWORD=psw PYPI_CLEANUP_OTP=otp DUCKDB_BASE_VERSION=<major.minor> python3 scripts/pypi_cleanup.py
replacing<major.minor>
with duckdb's base version (actual versions for now are 1.2 and 1.3 formain
).Should exit if
DUCKDB_BASE_VERSION
is empty, shorter than 3 symbols, contains more or less than 1 dots.When the version is set to one of actual versions, expected that the value of
pkg_vers
variable contains the list of the dev releases names to delete.