8000 Tags · penggu/kueue · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: penggu/kueue

Tags

v0.6.0-devel

Toggle v0.6.0-devel's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update version references to v0.5.0 (kubernetes-sigs#1262)

Change-Id: I0027f2027dd8c73e24f7c6611733921de8d179da

v0.5.0

Toggle v0.5.0's commit message
Kueue v0.5.0

Changes since `v0.4.0`:

Changes by Kind

Feature

- A mechanism for AdmissionChecks to provide labels, annotations, tolerations and node selectors to the pod templates when starting a job (kubernetes-sigs#1180, @mimowo)
- A reference standalone controller that can be used to support plain Pods using taints and tolerations, which can be used in Kubernetes versions that don't support scheduling gates. (kubernetes-sigs#1111, @nstogner)
- Add Active condition to AdmissionChecks (kubernetes-sigs#1193, @trasc)
- Add optional cluster queue resource quota and usage metrics. (kubernetes-sigs#982, @trasc)
- Add support for AdmissionChecks, a mechanism for internal or external components to influence whether a Workload can be admitted. (kubernetes-sigs#1045, @trasc)
- Add support for single plain Pods. (kubernetes-sigs#1072, @achernevskii)
- Add support for workload Priority (kubernetes-sigs#1081, @Gekko0114)
- Add tolerations to ResourceFlavor. Kueue injects these tolerations to the jobs that are assigned to the flavor when admitted. (kubernetes-sigs#1248, @trasc)
- Added pprof endpoints for profiling (kubernetes-sigs#978, @stuton)
- Allow the admission of multiple workloads within one scheduling cycle while borrowing. (kubernetes-sigs#1039, @trasc)
- An option to synchronize batch/job.completions with parallelism in case of partial admission (kubernetes-sigs#971, @trasc)
- Expose cluster queue information about pending workloads (kubernetes-sigs#1069, @stuton)
- Expose probe configurations to helm chart (kubernetes-sigs#986, @yyzxw)
- Graduate Partial admission to Beta. (kubernetes-sigs#1221, @trasc)
- Integrate with Cluster Autoscaler's ProvisioningRequest via two stage admission (kubernetes-sigs#1154, @trasc)
- Manage cluster queue active state based on admission checks life cycle. (kubernetes-sigs#1079, @trasc)
- Metrics for usage and reservations in ClusterQueues and LocalQueues. (kubernetes-sigs#1206, @trasc)
- Options to allow workloads to borrow quota or preempt other workloads before trying the next flavor in the list (kubernetes-sigs#849, @KunWuLuan)
- Support kubeflow.org/mxjob (kubernetes-sigs#1183, @tenzen-y)
- Support kubeflow.org/paddlejob (kubernetes-sigs#1142, @tenzen-y)
- Support kubeflow.org/pytorchjob (kubernetes-sigs#995, @tenzen-y)
- Support kubeflow.org/tfjob (kubernetes-sigs#1068, @tenzen-y)
- Support kubeflow.org/xgboostjob (kubernetes-sigs#1114, @tenzen-y)
- Workload objects have the label `kueue.x-k8s.io/job-uid` where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet (kubernetes-sigs#1032, @achernevskii)

Bug or Regression

- Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (kubernetes-sigs#1197, @alculquicondor)
- Ensure the ClusterQueue status is updated as the number of pending workloads changes. (kubernetes-sigs#1135, @mimowo)
- Fix resuming of RayJob after preempted. (kubernetes-sigs#1156, @kerthcet)
- Fixed missing create verb for webhook (kubernetes-sigs#1035, @stuton)
- Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (kubernetes-sigs#1023, @alculquicondor)
- Helm: Enable the JobSet integration by default (kubernetes-sigs#1184, @tenzen-y)
- Improve job controller to be resilient to API failures during preemption (kubernetes-sigs#1005, @alculquicondor)
- Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (kubernetes-sigs#1024, @alculquicondor)
- Terminate Kueue when there is an internal failure during setup, so that it can be retried. (kubernetes-sigs#1077, @alculquicondor)

Other (Cleanup or Flake)

- Add client-go library for AdmissionCheck (kubernetes-sigs#1104, @tenzen-y)
- Add mergeStrategy:merge to all conditions of API objects (kubernetes-sigs#1089, @alculquicondor)
- Update ray-operator to v0.6.0 (kubernetes-sigs#1231, @lowang-bh)

v0.4.2

Toggle v0.4.2's commit message
Changes since `v0.4.1`:

- Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (kubernetes-sigs#1197, @alculquicondor)
- Fix resuming of RayJob after preempted. (kubernetes-sigs#1190, @kerthcet)

v0.4.1

Toggle v0.4.1's commit message
- Fixed missing create verb for webhook (kubernetes-sigs#1053, @stuton )

- Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (kubernetes-sigs#1029, @alculquicondor)
- Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (kubernetes-sigs#1030, @alculquicondor)

v0.5.0-devel

Toggle v0.5.0-devel's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request kubernetes-sigs#959 from kubernetes-sigs/website-r…

…elease-0.4

Update for release 0.4

v0.4.0

Toggle v0.4.0's commit message
Changes since `v0.3.0`:

API Change

- Report resource usage in LocalQueue. (kubernetes-sigs#737, @tenzen-y)

Feature

- Add client-go libraries. (kubernetes-sigs#789, @tenzen-y)
- Add support for Kuberay's RayJobs. (kubernetes-sigs#667, @trasc)
- Add support for dynamic reclaim in the JobSet integration. (kubernetes-sigs#901, @trasc)
- Add support for partial workload admission (kubernetes-sigs#771
8000
, @trasc)
- Add the support for dynamic resources reclaim. (kubernetes-sigs#756, @trasc)
- Allow scheduler to admit more jobs when the head job have not reached the PodReady=true status. (kubernetes-sigs#708, @KunWuLuan)
- Allow specifying the manager pod and container security context instead of hardcoded values (kubernetes-sigs#878, @bh-tt)
- Feature gates for alpha/experimental features is introduced to Kueue Project. (kubernetes-sigs#788, @kerthcet)
- Ignoring integrations if crd wasn't installed otherwise all integrations are enabled by default (kubernetes-sigs#883, @stuton)
- Integrate JobSet into kueue (kubernetes-sigs#762, @mcariatm)

Bug or Regression

- Add permission to update frameworkjob status. (kubernetes-sigs#797, @tenzen-y)
- Fix a bug that updates events for clusterQueues are created endlessly. (kubernetes-sigs#907, @tenzen-y)
- Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (kubernetes-sigs#835, @tenzen-y)
- Fix panic in cluster queue if resources and coveredResources do not have the same length. (kubernetes-sigs#787, @kannon92)
- Fix: Enforce borrowed=0 if ClusterQueue doesn't belong to a cohort. (kubernetes-sigs#759, @tenzen-y)
- Fix: Potential over-admission within cohort when borrowing. (kubernetes-sigs#805, @trasc)
- Fixed preemption to prefer preempting workloads that were more recently admitted. (kubernetes-sigs#843, @stuton)
- Fixed the suspend=true add to the job/mpijob by the default webhook has not taken effect. (kubernetes-sigs#758, @fjding)

Other (Cleanup or Flake)

- Add validation for child jobs without ownerReference. (kubernetes-sigs#865, @tenzen-y)

v0.3.2

Toggle v0.3.2's commit message
Kueue v0.3.2

Changes since `v0.3.1`:

- Add permission to update frameworkjob status. (kubernetes-sigs#798, @tenzen-y)
- Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (kubernetes-sigs#839, @tenzen-y)
- Fix panic in cluster queue if resources and coveredResources do not have the same length. (kubernetes-sigs#799, @kannon92)
- Fix: Potential over-admission within cohort when borrowing. (kubernetes-sigs#822, @trasc)
- Fixed preemption to prefer preempting workloads that were more recently admitted. (kubernetes-sigs#845, @stuton)

v0.3.1

Toggle v0.3.1's commit message
Changes since `v0.3.0`:

- Fix a bug that the validation webhook doesn't validate the queue name set as a label when creating MPIJob. kubernetes-sigs#711
- Fix a bug that updates a queue name in workloads with an empty value when using framework jobs that use batch/job internally, such as MPIJob. kubernetes-sigs#713
- Fix a bug in which borrowed values are set to a non-zero value even though the ClusterQueue doesn't belong to a cohort. kubernetes-sigs#761
- Fixed adding suspend=true job/mpijob by the default webhook. kubernetes-sigs#765

v0.4.0-devel

Toggle v0.4.0-devel's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request kubernetes-sigs#685 from alculquicondor/main-0.3.0

Update docs to v0.3.0 in main branch

v0.3.0

Toggle v0.3.0's commit message
Changes since `v0.2.1`:

- Support for kubeflow's MPIJob (v2beta1)
- Upgrade the `config.kueue.x-k8s.io` API version from `v1alpha1` to `v1beta1`. `v1alpha1` is no longer supported.
  `v1beta1` includes the following changes:
  - Add `namespace` to propagate the namespace where kueue is deployed to the webhook certificate.
  - Add `internalCertManagement` with fields `enable`, `webhookServiceName` and `webhookSecretName`.
  - Remove `enableInternalCertManagement`. Use `internalCertManagement.enable` instead.
- Upgrade the `kueue.x-k8s.io` API version from `v1alpha2` to `v1beta1`.
  `v1alpha2` is no longer supported.
  `v1beta1` includes the following changes:
  - `ClusterQueue`:
    - Immutability of `spec.queueingStrategy`.
    - Refactor `quota.min` and `quota.max` into `nominalQuota` and `borrowingLimit`.
    - Swap hieararchy between `resources` and `flavors`.
    - Group flavors and resources into `spec.resourceGroups` to make
      co-dependent resources explicit.
    - Move `admission` from `spec` to `status`.
    - Add `conditions` field to `status`.
  - `LocalQueue`:
    - Add `admitted` field in `status`.
    - Add `conditions` field to `status`.
  - `Workload`:
    - Add `metadata` to `podSet` templates.
    - Move `admission` into `status`.
  - `ResourceFlavor`:
    - Introduce `spec` to hold all fields.
    - Rename `labels` to `nodeLabels`.
    - Rename `taints` to `nodeTaints`.
- Reduce API calls by setting `.status.admission` and updating the `Admitted` condition in the same API call.
- Obtain queue names from label `kueue.x-k8s.io/queue-name`. The annotation with
  the same name is still supported, but it's now deprecated.
- Multiplatform support for `linux/amd64` and `linux/arm64`.
- Validating webhook for `batch/v1.Job` validates kueue-specific labels and
  annotations.
- Sequential admission of jobs https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/
- Preemption within ClusterQueue and cohort https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption
- Support for LimitRanges when calculating jobs usage.
- Library for integrating job-like CRDs (controller and webhooks) https://sigs.k8s.io/kueue/pkg/controller/jobframework

- E2E tests for kubernetes 1.24, 1.25 1.26 on Kind
- Improve readability and code location in logging kubernetes-sigs#14
- Optimized configuration for small size clusters with higher API QPS and number
  of workers.
- Reproducible load tests https://sigs.k8s.io/kueue/test/performance
- Documentation website https://kueue.sigs.k8s.io/docs/

- Fix job controller ClusterRole for clusters that enable OwnerReferencesPermissionEnforcement admission control validation kubernetes-sigs#392
- Fix race condition when admission attempt and requeuing happen at the same time kubernetes-sigs#427
- Atomically release quota and requeue previously inadmissible workloads kubernetes-sigs#512
- Fix support for leader election kubernetes-sigs#580
- Fix support for RuntimeClass when calculating jobs usage kubernetes-sigs#565
0