8000 [RFC] PyTorch next wheel build platform: manylinux-2.28 · Issue #123649 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[RFC] PyTorch next wheel build platform: manylinux-2.28 #123649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks done
atalman opened this issue Apr 9, 2024 · 27 comments
Closed
5 tasks done

[RFC] PyTorch next wheel build platform: manylinux-2.28 #123649

atalman opened this issue Apr 9, 2024 · 27 comments
Assignees
Labels
module: binaries Anything related to official binaries that we release to users oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@atalman
Copy link
Contributor
atalman commented Apr 9, 2024

🚀 The feature, motivation and pitch

CentOS EOL is scheduled for the summer: https://www.redhat.com/en/blog/fastest-road-centos-linux-red-hat-enterprise-linux

With official manylinux has moved to almalinux: https://github.com/pypa/manylinux/blob/main/docker/Dockerfile#L2

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 12
x86_64 image: quay.io/pypa/manylinux_2_28_x86_64
aarch64 image: quay.io/pypa/manylinux_2_28_aarch64
ppc64le image: quay.io/pypa/manylinux_2_28_ppc64le
s390x image: quay.io/pypa/manylinux_2_28_s390x

Almalinux:
https://almalinux.org/

Opening this issue to gather feedback, decide on the timeline for migration of the wheel build images

Work Completed:

RFC: Discussion issue #126551

cc @seemethere @malfet @osalpekar @pytorch/pytorch-dev-infra @ptrblck

@huydhn
Copy link
Contributor
huydhn commented Apr 9, 2024

@AlekseiNikiforovIBM recently adds the docker build image for s390x with the RedHat 9 base image from https://catalog.redhat.com/software/containers/ubi9/ubi/615bcf606feffc5384e8452e?architecture=amd64&image=65e093e60a21b531a96f93ca.

But going with the official manylinux suggestion sounds like a good choice too as long as they have all the different flavors that we need and compatible with RedHat.

@peri044
Copy link
Contributor
peri044 commented Apr 9, 2024

Torch-TRT used to use the manylinux (Centos 7 based) containers for build and tests. Currently, TensorRT 10 only supports RHEL 9 and above (with glibc 2.28+ https://pypi.org/project/tensorrt/10.0.0b6/#files). It would be great if Pytorch supports these latest almalinux based releases soon as they would be compatible. So this feature would be of high interest for us and it would unblock our workflow.

@seemethere
Copy link
Member

I don't think we should base on RHEL considering getting access to RHEL images is based on if you have a subscription or not (or at least that's been my experience, happy to be wrong here).

I'd say we should go with whatever the official upstream manylinux distribution is utilizing. If that's almalinux then we go with that, if it's something else then we should go with what they're going with.

@seemethere seemethere added the oncall: releng In support of CI and Release Engineering label Apr 9, 2024
@ezyang ezyang added module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 10, 2024
@malfet
Copy link
Contributor
malfet commented Apr 10, 2024

manylinux-2.28 is the next standard we should be moving to. And perhaps it's also time to kill PRE_CXX11 ABI (which were necessary for all manylinux standards prior to 2.28)

@seemethere
Copy link
Member

manylinux-2.28 is the next standard we should be moving to. And perhaps it's also time to kill PRE_CXX11 ABI (which were necessary for all manylinux standards prior to 2.28)

I'd agree here, no need to keep dinosaurs around if we're moving to the future!

@zeroepoch
Copy link

TensorRT is moving back to manylinux_2_17 after some push back from customers. Although this shouldn't create issues for projects like PyTorch consuming TRT using manylinux_2_28 due to glibc backward compatibility. The CXX ABI change is also being discussed (slaying dinosaurs)

@ZolotukhinM
Copy link

manylinux-2.28 is the next standard we should be moving to. And perhaps it's also time to kill PRE_CXX11 ABI (which were necessary for all manylinux standards prior to 2.28)

Are there any estimate timelines on when this can happen? IIUC most of current builds are still targeting CXX03 ABI (from looking at https://download.pytorch.org/whl/torch/)

@atalman atalman changed the title [RFC] PyTorch next wheel build platform [RFC] PyTorch next wheel build platform: manylinux-2.28 Apr 23, 2024
@snadampal
Copy link
Collaborator

I have the PRs for aarch64-linux platform CD migration to manylinux 2_28.
pytorch/builder#1784
pytorch/builder#1781

but I see few OS distros (for example Amazon Linux2) is with glibc2_26 and switching to manylinux2_28 will break compatibility with it. I would suggest we first make the announcement about the timeline for this switch to avoid surprises and disappointments for customers.

@atalman
Copy link
Contributor Author
atalman commented May 10, 2024

@snadampal @malfet @seemethere
I suggest following option: still use old wheels for 2.4, announce the deprecation of old wheels during the release. For Release 2.5 - Use only new wheels. This will give Amazon Linux 2 users time to prepare.

As per: https://aws.amazon.com/amazon-linux-2/faqs/
Amazon Linux 2 EOL: 2025-06-30

@huydhn
Copy link
Contributor
huydhn commented May 10, 2024

Our CI are still on Amazon Linux 2, so I guess we need to migrate it to use Amazon Linux 2023 before the the new wheel could even pass validation.

@atalman
Copy link
Contributor Author
atalman commented May 13, 2024

@snadampal You are correct wheel generated with manylinux2_28 is not compatible with Amazon Linux 2. Here is the test:

(py38) [ec2-user@ip-10-0-9-21 temp]$ pip install torch-2.4.0.dev20240513%2Bcpu-cp38-cp38-linux_x86_64.whl
Processing ./torch-2.4.0.dev20240513%2Bcpu-cp38-cp38-linux_x86_64.whl
Collecting filelock (from torch==2.4.0.dev20240513+cpu)
  Downloading filelock-3.14.0-py3-none-any.whl.metadata (2.8 kB)
Collecting typing-extensions>=4.8.0 (from torch==2.4.0.dev20240513+cpu)
  Downloading typing_extensions-4.11.0-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy (from torch==2.4.0.dev20240513+cpu)
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch==2.4.0.dev20240513+cpu)
  Downloading networkx-3.1-py3-none-any.whl.metadata (5.3 kB)
Collecting jinja2 (from torch==2.4.0.dev20240513+cpu)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch==2.4.0.dev20240513+cpu)
  Downloading fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch==2.4.0.dev20240513+cpu)
  Downloading MarkupSafe-2.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting mpmath>=0.19 (from sympy->torch==2.4.0.dev20240513+cpu)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Downloading typing_extensions-4.11.0-py3-none-any.whl (34 kB)
Downloading filelock-3.14.0-py3-none-any.whl (12 kB)
Downloading fsspec-2024.3.1-py3-none-any.whl (171 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.0/172.0 kB 23.3 MB/s eta 0:00:00
Downloading jinja2-3.1.4-py3-none-any.whl (133 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.3/133.3 kB 18.4 MB/s eta 0:00:00
Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 90.7 MB/s eta 0:00:00
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 112.1 MB/s eta 0:00:00
Downloading MarkupSafe-2.1.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 50.5 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-2.1.5 filelock-3.14.0 fsspec-2024.3.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.1 sympy-1.12 torch-2.4.0.dev20240513+cpu typing-extensions-4.11.0
(py38) [ec2-user@ip-10-0-9-21 temp]$ conda list
# packages in environment at /home/ec2-user/miniconda3/envs/py38:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
ca-certificates           2024.3.11            h06a4308_0  
filelock                  3.14.0                   pypi_0    pypi
fsspec                    2024.3.1                 pypi_0    pypi
jinja2                    3.1.4                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
markupsafe                2.1.5                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.1                      pypi_0    pypi
openssl                   3.0.13               h7f8727e_1  
pip                       24.0             py38h06a4308_0  
python                    3.8.19               h955ad1f_0  
readline                  8.2                  h5eee18b_0  
setuptools                69.5.1           py38h06a4308_0  
sqlite                    3.45.3               h5eee18b_0  
sympy                     1.12                     pypi_0    pypi
tk                        8.6.14               h39e8969_0  
torch                     2.4.0.dev20240513+cpu          pypi_0    pypi
typing-extensions         4.11.0                   pypi_0    pypi
wheel                     0.43.0           py38h06a4308_0  
xz                        5.4.6                h5eee18b_1  
zlib                      1.2.13               h5eee18b_1  

...
(py38) [ec2-user@ip-10-0-9-21 test]$ python smoke_test/smoke_test.py 
Traceback (most recent call last):
  File "smoke_test/smoke_test.py", line 5, in <module>
    import torch
  File "/home/ec2-user/miniconda3/envs/py38/lib/python3.8/site-packages/torch/__init__.py", line 240, in <module>
    from torch._C import *  # noqa: F403
ImportError: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /home/ec2-user/miniconda3/envs/py38/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)

@chuanqi129
Copy link
Collaborator
chuanqi129 commented Jun 18, 2024

Hi @atalman, we're working on XPU nightly manylinux wheel build enabling, and for XPU we also need to use manylinux_2_28 environment. But I found that the official quay.io/pypa/manylinux_2_28_x86_64 docker image don't provide shared library libpython*.so for each python version under /opt/python (refer pypa/manylinux#255), so we can't use it to build pytorch python wheel directly, we may still need to rebuild the python by ourselves. How do you think about it?

@EikanWang
Copy link
Collaborator

For Intel GPU, we will submit a PR to support ABI=0 if PT2.5 cannot switch to ABI=1 mode. Due to the ABI change for Intel GPU software breaking the backward compatibility, we will refine the cmake a little bit - #130110 FYI

@jithunnair-amd
Copy link
Collaborator
jithunnair-amd commented Jul 25, 2024

@atalman @malfet @albanD @seemethere Actually, do we need to support the _GLIBCXX_USE_CXX11_ABI=0 case for PyTorch wheels, or can we sunset that requirement?

@malfet
Copy link
Contributor
malfet commented Jul 25, 2024

@jithunnair-amd that depends on who your customers. If you believe that this is fine for ROCm, please do not hesitate to propose a patch

@jithunnair-amd
Copy link
Collaborator

@jithunnair-amd that depends on who your customers. If you believe that this is fine for ROCm, please do not hesitate to propose a patch

Okay, I think we can start with setting DESIRED_DEVTOOLSET: cxx11-abi in

and see if there's any community motivation to stay on pre-cxx11-abi symbols. I'll check internally regarding customers as well.

pobin6 pushed a commit to pobin6/pytorch that referenced this issue Dec 5, 2024
…CUDA11.8, CUDA12.4 (pytorch#141565)

For release 2.6 we will be using only CUDA 12.6 binaries on Manylinux 2.28.
Issue: pytorch#123649
Pull Request resolved: pytorch#141565
Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/malfet
@jithunnair-amd
Copy link
Collaborator

ROCm PRs to move to manylinux2_28:

  • update GCC_ABI check to expect 1 for ROCm manylinux2_28 wheels: 2043
  • ROCm manylinux2_28 manylinux images: 140681
  • upgrade to gcc11 for manylinux2_28 images: 141609
  • ROCm manylinux2_28 PyTorch binaries - 141423
  • ROCm manylinux2_28 PyTorch ecosystem binaries - 6016

fmo-mt pushed a commit to fmo-mt/pytorch that referenced this issue Dec 11, 2024
Fixes pytorch#123649
Use Manylinux 2_28 Docker builds for PyTorch Nightly builds

This moves the wheels to a Docker image that uses : ``quay.io/pypa/manylinux_2_28_x86_64`` as a base rather then ``centos:7`` which is EOL on June 30, 2024.

Information:
https://github.com/pypa/manylinux#manylinux_2_28-almalinux-8-based

manylinux_2_28 (AlmaLinux 8 based)
Toolchain: GCC 13
Built wheels are also expected to be compatible with other distros using glibc 2.28 or later, including:
Debian 10+
Ubuntu 18.10+
Fedora 29+
CentOS/RHEL 8+

This migration should enable us to migrate to latest CUDNN version, and land this PR: pytorch#137978

Pull Request resolved: pytorch#138732
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/huydhn
@atalman atalman removed this from the 2.6.0 milestone Dec 17, 2024
@atalman atalman pinned this issue Dec 17, 2024
@atalman
Copy link
Contributor Author
atalman commented Dec 17, 2024

Switching PyTorch nightlies to Manylinux 2.28 builds and cxx11-abi by default: #143423

pytorchmergebot pushed a commit that referenced this issue Feb 28, 2025
We're dropping regular old manylinux so let's drop it here too

Relates to #123649

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: #148129
Approved by: https://github.com/Camyll, https://github.com/huydhn, https://github.com/malfet, https://github.com/atalman
ghstack dependencies: #148126
atalman added a commit to pytorch/test-infra that referenced this issue Feb 28, 2025
1. Migrate all the workflows to liux_job_v2 since its uses Manylinux
2.28 : RFC: pytorch/pytorch#123649
2. Remove stronghold workflow, since not used
3. Use get_stable_cuda_version to validate repacked binary size whls
majing921201 pushed a commit to majing921201/pytorch that referenced this issue Mar 4, 2025
We're dropping regular old manylinux so let's drop it here too

Relates to pytorch#123649

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: pytorch#148129
Approved by: https://github.com/Camyll, https://github.com/huydhn, https://github.com/malfet, https://github.com/atalman
ghstack dependencies: pytorch#148126
@MeetVadakkanchery MeetVadakkanchery unpinned this issue Mar 6, 2025
@atalman
Copy link
Contributor Author
atalman commented Apr 2, 2025

Removed workaround created in test-infra repo to support Manylinux 2014 workers: pytorch/test-infra#6491

@atalman
Copy link
Contributor Author
atalman commented Apr 29, 2025

Closing this as completed

@atalman atalman closed this as completed Apr 29, 2025
pytorchmergebot pushed a commit that referenced this issue Apr 29, 2025
Related to Manylinux 2.28 migration: #123649
Cleanup old Docker files and `manylinuxaarch64-builder:cpu-aarch64` image which has been replaced by `manylinux2_28_aarch64-builder:cpu-aarch64`
Pull Request resolved: #152428
Approved by: https://github.com/Skylion007, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: binaries Anything related to official binaries that we release to users oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

0