Tags: penggu/kueue
Tags
Update version references to v0.5.0 (kubernetes-sigs#1262) Change-Id: I0027f2027dd8c73e24f7c6611733921de8d179da
Kueue v0.5.0 Changes since `v0.4.0`: Changes by Kind Feature - A mechanism for AdmissionChecks to provide labels, annotations, tolerations and node selectors to the pod templates when starting a job (kubernetes-sigs#1180, @mimowo) - A reference standalone controller that can be used to support plain Pods using taints and tolerations, which can be used in Kubernetes versions that don't support scheduling gates. (kubernetes-sigs#1111, @nstogner) - Add Active condition to AdmissionChecks (kubernetes-sigs#1193, @trasc) - Add optional cluster queue resource quota and usage metrics. (kubernetes-sigs#982, @trasc) - Add support for AdmissionChecks, a mechanism for internal or external components to influence whether a Workload can be admitted. (kubernetes-sigs#1045, @trasc) - Add support for single plain Pods. (kubernetes-sigs#1072, @achernevskii) - Add support for workload Priority (kubernetes-sigs#1081, @Gekko0114) - Add tolerations to ResourceFlavor. Kueue injects these tolerations to the jobs that are assigned to the flavor when admitted. (kubernetes-sigs#1248, @trasc) - Added pprof endpoints for profiling (kubernetes-sigs#978, @stuton) - Allow the admission of multiple workloads within one scheduling cycle while borrowing. (kubernetes-sigs#1039, @trasc) - An option to synchronize batch/job.completions with parallelism in case of partial admission (kubernetes-sigs#971, @trasc) - Expose cluster queue information about pending workloads (kubernetes-sigs#1069, @stuton) - Expose probe configurations to helm chart (kubernetes-sigs#986, @yyzxw) - Graduate Partial admission to Beta. (kubernetes-sigs#1221, @trasc) - Integrate with Cluster Autoscaler's ProvisioningRequest via two stage admission (kubernetes-sigs#1154, @trasc) - Manage cluster queue active state based on admission checks life cycle. (kubernetes-sigs#1079, @trasc) - Metrics for usage and reservations in ClusterQueues and LocalQueues. (kubernetes-sigs#1206, @trasc) - Options to allow workloads to borrow quota or preempt other workloads before trying the next flavor in the list (kubernetes-sigs#849, @KunWuLuan) - Support kubeflow.org/mxjob (kubernetes-sigs#1183, @tenzen-y) - Support kubeflow.org/paddlejob (kubernetes-sigs#1142, @tenzen-y) - Support kubeflow.org/pytorchjob (kubernetes-sigs#995, @tenzen-y) - Support kubeflow.org/tfjob (kubernetes-sigs#1068, @tenzen-y) - Support kubeflow.org/xgboostjob (kubernetes-sigs#1114, @tenzen-y) - Workload objects have the label `kueue.x-k8s.io/job-uid` where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet (kubernetes-sigs#1032, @achernevskii) Bug or Regression - Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (kubernetes-sigs#1197, @alculquicondor) - Ensure the ClusterQueue status is updated as the number of pending workloads changes. (kubernetes-sigs#1135, @mimowo) - Fix resuming of RayJob after preempted. (kubernetes-sigs#1156, @kerthcet) - Fixed missing create verb for webhook (kubernetes-sigs#1035, @stuton) - Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (kubernetes-sigs#1023, @alculquicondor) - Helm: Enable the JobSet integration by default (kubernetes-sigs#1184, @tenzen-y) - Improve job controller to be resilient to API failures during preemption (kubernetes-sigs#1005, @alculquicondor) - Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (kubernetes-sigs#1024, @alculquicondor) - Terminate Kueue when there is an internal failure during setup, so that it can be retried. (kubernetes-sigs#1077, @alculquicondor) Other (Cleanup or Flake) - Add client-go library for AdmissionCheck (kubernetes-sigs#1104, @tenzen-y) - Add mergeStrategy:merge to all conditions of API objects (kubernetes-sigs#1089, @alculquicondor) - Update ray-operator to v0.6.0 (kubernetes-sigs#1231, @lowang-bh)
Changes since `v0.4.1`: - Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (kubernetes-sigs#1197, @alculquicondor) - Fix resuming of RayJob after preempted. (kubernetes-sigs#1190, @kerthcet)
- Fixed missing create verb for webhook (kubernetes-sigs#1053, @stuton ) - Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (kubernetes-sigs#1029, @alculquicondor) - Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (kubernetes-sigs#1030, @alculquicondor)
Merge pull request kubernetes-sigs#959 from kubernetes-sigs/website-r… …elease-0.4 Update for release 0.4
Changes since `v0.3.0`: API Change - Report resource usage in LocalQueue. (kubernetes-sigs#737, @tenzen-y) Feature - Add client-go libraries. (kubernetes-sigs#789, @tenzen-y) - Add support for Kuberay's RayJobs. (kubernetes-sigs#667, @trasc) - Add support for dynamic reclaim in the JobSet integration. (kubernetes-sigs#901, @trasc) - Add support for partial workload admission (kubernetes-sigs#771 8000 , @trasc) - Add the support for dynamic resources reclaim. (kubernetes-sigs#756, @trasc) - Allow scheduler to admit more jobs when the head job have not reached the PodReady=true status. (kubernetes-sigs#708, @KunWuLuan) - Allow specifying the manager pod and container security context instead of hardcoded values (kubernetes-sigs#878, @bh-tt) - Feature gates for alpha/experimental features is introduced to Kueue Project. (kubernetes-sigs#788, @kerthcet) - Ignoring integrations if crd wasn't installed otherwise all integrations are enabled by default (kubernetes-sigs#883, @stuton) - Integrate JobSet into kueue (kubernetes-sigs#762, @mcariatm) Bug or Regression - Add permission to update frameworkjob status. (kubernetes-sigs#797, @tenzen-y) - Fix a bug that updates events for clusterQueues are created endlessly. (kubernetes-sigs#907, @tenzen-y) - Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (kubernetes-sigs#835, @tenzen-y) - Fix panic in cluster queue if resources and coveredResources do not have the same length. (kubernetes-sigs#787, @kannon92) - Fix: Enforce borrowed=0 if ClusterQueue doesn't belong to a cohort. (kubernetes-sigs#759, @tenzen-y) - Fix: Potential over-admission within cohort when borrowing. (kubernetes-sigs#805, @trasc) - Fixed preemption to prefer preempting workloads that were more recently admitted. (kubernetes-sigs#843, @stuton) - Fixed the suspend=true add to the job/mpijob by the default webhook has not taken effect. (kubernetes-sigs#758, @fjding) Other (Cleanup or Flake) - Add validation for child jobs without ownerReference. (kubernetes-sigs#865, @tenzen-y)
Kueue v0.3.2 Changes since `v0.3.1`: - Add permission to update frameworkjob status. (kubernetes-sigs#798, @tenzen-y) - Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (kubernetes-sigs#839, @tenzen-y) - Fix panic in cluster queue if resources and coveredResources do not have the same length. (kubernetes-sigs#799, @kannon92) - Fix: Potential over-admission within cohort when borrowing. (kubernetes-sigs#822, @trasc) - Fixed preemption to prefer preempting workloads that were more recently admitted. (kubernetes-sigs#845, @stuton)
Changes since `v0.3.0`: - Fix a bug that the validation webhook doesn't validate the queue name set as a label when creating MPIJob. kubernetes-sigs#711 - Fix a bug that updates a queue name in workloads with an empty value when using framework jobs that use batch/job internally, such as MPIJob. kubernetes-sigs#713 - Fix a bug in which borrowed values are set to a non-zero value even though the ClusterQueue doesn't belong to a cohort. kubernetes-sigs#761 - Fixed adding suspend=true job/mpijob by the default webhook. kubernetes-sigs#765
Merge pull request kubernetes-sigs#685 from alculquicondor/main-0.3.0 Update docs to v0.3.0 in main branch
Changes since `v0.2.1`: - Support for kubeflow's MPIJob (v2beta1) - Upgrade the `config.kueue.x-k8s.io` API version from `v1alpha1` to `v1beta1`. `v1alpha1` is no longer supported. `v1beta1` includes the following changes: - Add `namespace` to propagate the namespace where kueue is deployed to the webhook certificate. - Add `internalCertManagement` with fields `enable`, `webhookServiceName` and `webhookSecretName`. - Remove `enableInternalCertManagement`. Use `internalCertManagement.enable` instead. - Upgrade the `kueue.x-k8s.io` API version from `v1alpha2` to `v1beta1`. `v1alpha2` is no longer supported. `v1beta1` includes the following changes: - `ClusterQueue`: - Immutability of `spec.queueingStrategy`. - Refactor `quota.min` and `quota.max` into `nominalQuota` and `borrowingLimit`. - Swap hieararchy between `resources` and `flavors`. - Group flavors and resources into `spec.resourceGroups` to make co-dependent resources explicit. - Move `admission` from `spec` to `status`. - Add `conditions` field to `status`. - `LocalQueue`: - Add `admitted` field in `status`. - Add `conditions` field to `status`. - `Workload`: - Add `metadata` to `podSet` templates. - Move `admission` into `status`. - `ResourceFlavor`: - Introduce `spec` to hold all fields. - Rename `labels` to `nodeLabels`. - Rename `taints` to `nodeTaints`. - Reduce API calls by setting `.status.admission` and updating the `Admitted` condition in the same API call. - Obtain queue names from label `kueue.x-k8s.io/queue-name`. The annotation with the same name is still supported, but it's now deprecated. - Multiplatform support for `linux/amd64` and `linux/arm64`. - Validating webhook for `batch/v1.Job` validates kueue-specific labels and annotations. - Sequential admission of jobs https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/ - Preemption within ClusterQueue and cohort https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption - Support for LimitRanges when calculating jobs usage. - Library for integrating job-like CRDs (controller and webhooks) https://sigs.k8s.io/kueue/pkg/controller/jobframework - E2E tests for kubernetes 1.24, 1.25 1.26 on Kind - Improve readability and code location in logging kubernetes-sigs#14 - Optimized configuration for small size clusters with higher API QPS and number of workers. - Reproducible load tests https://sigs.k8s.io/kueue/test/performance - Documentation website https://kueue.sigs.k8s.io/docs/ - Fix job controller ClusterRole for clusters that enable OwnerReferencesPermissionEnforcement admission control validation kubernetes-sigs#392 - Fix race condition when admission attempt and requeuing happen at the same time kubernetes-sigs#427 - Atomically release quota and requeue previously inadmissible workloads kubernetes-sigs#512 - Fix support for leader election kubernetes-sigs#580 - Fix support for RuntimeClass when calculating jobs usage kubernetes-sigs#565
PreviousNext