Releases: dell/omnia
Omnia 1.7.1
New Features:
- Platform enablement of AMD 17G PowerEdge servers - R6725, R7725, R6715, R7715
- New operating system support - Ubuntu 24.04
- Enablement of Intel Gaudi 3 accelerator on Ubuntu 24.04 & 22.04 OS
- Enablement of NVIDIA accelerators - L40s, H100 NVL, H200 SXM
- NVIDIA Collective Communications Library (NCCL) 2.25.1 on nodes running Ubuntu 24.04 OS
- NVIDIA GPU operator (25.3.0) on nodes running Ubuntu 24.04 OS
- Support for ROCm Communication Collectives Library (RCCL) 2.21.5 on nodes with AMD accelerators
- Support for RoCE configuration with Calico network plugin
- Ability to add external nodes (with pre-loaded OS and internet connectivity) to a Kubernetes (K8s) cluster
- Addition of Multus-CNI plugin (4.1.4) and Whereabouts plugin (0.8.0) for Kubernetes (K8s)
- Ability to configure additional NICs and update kernel parameters during compute node provisioning
- Upgrade support on Omnia Infrastructure Manager (OIM) from v1.7 to v1.7.1
- Software stack updates:
- Intel Gaudi driver - 1.19.2
- Kubernetes - 1.31.4
- Kubespray - 2.27
- CSI PowerScale driver - 2.13.0
- NVIDIA CUDA - 12.8
- NVIDIA vLLM - 0.7.2
- AMD ROCm - 6.3.1
- Grafana - 11.4.1
- BCM RoCE - 232.1.133.2
- Jinja - 3.1.6
Documentation Enhancements:
- Grouping by OS version in the
Software Installed by Omnia
subsection - Addition of
Unsupported packages based on cluster OS
subsection
Omnia 1.7.1-rc2
- Kubernetes version 1.31.4 support
- Ubuntu 24.04 support with netplan configuration
- IP rule assignment playbook integration with server spec update
- Server spec update & kernel parameters update while provisioning
- Security vulnerability fix - jinja2 version update 3.1.5
- Xilinx device plugin version downgrade 1.2.0
- Fixed broken links for Rocky OS.
Omnia 1.7.1-rc1
What’s New in this pre-release:
- Kubernetes version 1.31.4 support
- Ubuntu 24.04 support
- Security vulnerability fix - jinja2 version update 3.1.5
- Xilinx device plugin version downgrade 1.2.0
- Fixed broken links for Rocky OS.
Omnia 1.7
Note: Fix for broken symlink in Omnia 1.7 that was blocking deployments on Rocky Linux clusters is available in Omnia 1.7.1-rc2 (Pre-release) and later releases. We recommend that Rocky Linux users upgrade to this version for smooth cluster deployments.
What’s New in this release:
- Refresh for XE9680 w/ AMD Mi300x accelerators & PowerSwitch Z9864F based network architecture
- Pre-enablement for XE9680 w/ Intel Gaudi 3 accelerators
- NVIDIA container toolkit for NVIDIA accelerators
- Installation of Kubernetes stack v1.29
- Sample playbook for a pre-trained Generative AI model - Llama 3.1
- CSI drivers for Kubernetes to access PowerScale storage
- Internal OpenLDAP server configuration as a proxy server
- Corporate proxy on RHEL, Rocky Linux, and Ubuntu clusters
- Omnia execution within a virtual environment w/ Python 3.11 and Ansible 9.5.1
- Setting OS Kernel command-line parameters using server_spec_update utility
- Revamped Omnia documentation featuring OS-specific install guides, deployment-flow diagram, and other enhancements
Omnia 1.6.1
This patch release is focused on fixing following issue:
- The dependent package ‘libssl1.1_1.1.1f-1ubuntu2.22_amd64’ required by Omnia 1.6 is no longer available for Ubuntu 22.04 OS.
Note:
- With Omnia 1.6.1, new cluster deployments will encounter a TLS CA certificate error with OpenLDAP due to changes in the dependent package ‘openldaptoolbox’. To resolve this, we recommend using Omnia 1.7 for new cluster deployments with OpenLDAP.
- A critical security vulnerability in the cryptography software used by Omnia versions 1.6.1 and earlier has been resolved in Omnia 1.7 by updating the cryptography software to version 44.0.0. We recommend that users upgrade to Omnia 1.7.
Omnia 1.6
This release has been deprecated since the dependent package ‘libssl1.1_1.1.1f-1ubuntu2.22_amd64’ is no longer available for Ubuntu 22.04
Note: Before running local repo in Omnia 1.6 production environment with Ubuntu 22.04 OS, please apply the fix by following the upgrade flow of Omnia 1.6.1
Omnia has been enhanced to offer:
-
Hardware Enablement
- Enablement for AI workloads on XE9680 with AMD Mi300x GPUs
-
OS enablement
-
Enablement for AI
-
Install GPU device plugin for Kubernetes
-
GPU device plugin for AMD
-
GPU device plugin for NVIDIA
-
-
Additional Features
-
One-off Utility to add a node or to remove a node.
-
HPC/AI cluster inventory partitioning
-
CPU inventory
-
AMD GPU inventory
-
NVIDIA GPU inventory
-
Omnia 1.5.1
This patch release is focused on fixing following issue:
-
Installation of Kubernetes 1.16 and 1.19 are deprecated.
-
Spark Operator support is deprecated.
-
Omnia now installs Kubernetes 1.26
Kubeflow is not supported on v1.5.1 due to Kubernetes upgrade.
Omnia 1.4.3.1
This release is focused on supporting following features:
-
Hardware Support: Intel E810 NIC, ConnectX-5/6 NICs.
-
Omnia github now hosts a “genesis” image with this functionality baked in for initial bootup.
-
Host aliasing for Scheduler and IPA authentication.
-
Login and Manager Node access from both public and private NIC.
-
Validation check enhancements:
-
Rearranged to occur as early as possible.
-
Isolate checks when running smaller playbooks.
-
-
Added a Benchmark Install Guide: OneAPI for Intel, MPI AOCC HPL for AMD.
Omnia 1.5
** This Release is now deprecated. Kubernetes v1.16 and v1.19 is no more available for deployment **
This release is focused on supporting following features:
- Expanded telemetry collection support to Regular, health check and GPU metrics.
- Rsyslog : Added ability to aggregate logs via xCAT’s syslog.
- Integration of apptainer for containerized HPC benchmark execution.
- Optimized installation of Visualization Dashboard and Log Aggregator Tool.
Omnia 1.4.3
This release is focused on supporting following features:
- XE 9640, R760 XA, R760 XD2 are now supported as control planes or target nodes with Nvidia H100 accelerators.
- Added ability for split port configuration on NVIDIA Quantum-2-based QM9700 (Nvidia InfiniBand NDR400 switches).
- Extended password-less SSH support for multiple user configuration in a single execution.
- Input mapping files and inventory files now support commented entries for customized playbook execution.
- NFS share is now available for hosting user home directories within the cluster.