-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Insights: ray-project/ray
Overview
Could not load contribution data
Please try again later
2 Releases published by 2 people
-
ray-2.47.0 Ray-2.47.0
published
Jun 12, 2025 -
ray-2.47.1 Ray-2.47.1
published
Jun 18, 2025
409 Pull requests merged by 90 people
-
[Doc] Convert configuring-autoscaling.ipynb back to markdown docs
#54111 merged
Jun 28, 2025 -
[Docs][KubeRay] Convert rayservice-quick-start.ipynb back to markdown docs
#54138 merged
Jun 28, 2025 -
[Doc][KubeRay] Convert raycluster-quick-start.ipynb back to markdown docs
#54125 merged
Jun 28, 2025 -
[Doc][KubeRay] Add doc for running KubeRay dashboard
#53830 merged
Jun 28, 2025 -
[ci] remove
ci/keep_alive
#54079 merged
Jun 28, 2025 -
[data] gather dask tests into single test files
#54163 merged
Jun 28, 2025 -
[Data] Add TooManyRequests catch to BQ writer
#54000 merged
Jun 28, 2025 -
[Data] Fix
test_binary
setup fixture that doesn't close file handles#54028 merged
Jun 28, 2025 -
[serve] Increase default uvicorn keep alive timeout
#54127 merged
Jun 27, 2025 -
[doc] fix broken links in the vllm guide
#54161 merged
Jun 27, 2025 -
[Docs][KubeRay] Delete KubeRay doctests
#54080 merged
Jun 27, 2025 -
[Feat][Core] Implement Event Aggregator Agent
#53182 merged
Jun 27, 2025 -
Feat/middleware callback support
#54106 merged
Jun 27, 2025 -
[data] Handle HuggingFace parquet dataset resolve URLs
#54146 merged
Jun 27, 2025 -
[data] Use
write_dataset
for partitioning & writing to file instead of custom implementation#54052 merged
Jun 27, 2025 -
Correct asyncio ref documentation for Python 3.11+
#54157 merged
Jun 27, 2025 -
[core][test] fix flaky data races in NodeManagerTest
#54129 merged
Jun 27, 2025 -
[Doc][KubeRay] verl example
#54114 merged
Jun 27, 2025 -
[RLlib] Fix shapes in
explained_variance
for recurrent policies.#54005 merged
Jun 27, 2025 -
[ci] add cibase tags for ci base envs
#53755 merged
Jun 27, 2025 -
Remove
botocore
dependency in Ray Serve LLM#54156 merged
Jun 27, 2025 -
(serve.llm) Remove test leakage from placement bundle logic
#53723 merged
Jun 27, 2025 -
[data] split dask and modin tests
#54122 merged
Jun 26, 2025 -
[Data] Fixing PyArrow overflow handling
#53971 merged
Jun 26, 2025 -
[serve] split call_user_method
#54104 merged
Jun 26, 2025 -
[Data] Handle Huggingface Integration CI test failures
#54128 merged
Jun 26, 2025 -
[Data] Fix ActorPool autoscaler to properly scale up
#53983 merged
Jun 26, 2025 -
use gtm datalayer directly, fix format
#54144 merged
Jun 26, 2025 -
[package] remove
__api__
insetup.py
#54143 merged
Jun 26, 2025 -
[Minor][Fix][Core/Test] Fix test_actor_restart_on_node_failure wrong test logic without waiting
#54088 merged
Jun 26, 2025 -
[data] fix repartitioning empty datasets
#54107 merged
Jun 26, 2025 -
[Doc][KubeRay] revert kuberay-gcs-ft.ipynb to markdown
#54084 merged
Jun 26, 2025 -
Fix sort_benchmark release test arg
#54145 merged
Jun 26, 2025 -
[Doc][KubeRay] Convert rayjob-quick-start.ipynb back to markdown docs
#54093 merged
Jun 26, 2025 -
[core] split dask and modin tests
#54121 merged
Jun 26, 2025 -
[Core] Remove Unnecessary Checks in GRPC Server Shutdown Process
#53910 merged
Jun 26, 2025 -
[core] Delete unused env vars
#54095 merged
Jun 26, 2025 -
[Doc][KubeRay] Remove
rayserve-dev-doc.md
#54057 merged
Jun 26, 2025 -
[core] Bump timeout in
test_ray_init
#54136 merged
Jun 26, 2025 -
[core] Clean up unused FFs
#54139 merged
Jun 26, 2025 -
[core] Fix GCS crash on duplicate MarkJobFinished RPCs due to network failures
#53951 merged
Jun 26, 2025 -
[train] Remove usage of
ray._private.state
#54142 merged
Jun 26, 2025 -
[core] Deflake
test_scheduling.py
in client mode#54137 merged
Jun 26, 2025 -
[core] Fix
test_basic_3.py
in client mode#54135 merged
Jun 26, 2025 -
[serve] refactor _run_user_code
#54103 merged
Jun 26, 2025 -
[Doc] vale ignores anchors of headers
#53580 merged
Jun 26, 2025 -
set config for ua tag
#54112 merged
Jun 26, 2025 -
[Serve.llm] Add a doc snippet to inform users about existing diffs between vllm serve and ray serve llm.
#54042 merged
Jun 26, 2025 -
[ci][docs] Add test tag rule for Vale files
#54118 merged
Jun 26, 2025 -
[train] update beginner pytorch example
#54124 merged
Jun 26, 2025 -
[Data] Bumped latest PA version to 20.0
#54123 merged
Jun 26, 2025 -
[ci] fix missing
dask
tag in all tags list#54113 merged
Jun 26, 2025 -
[core][test] fix data races in NodeManagerTest
#54097 merged
Jun 25, 2025 -
[core] Remove experimental "array" library
#54105 merged
Jun 25, 2025 -
[core] Clean up
test_locality_aware_leasing_borrowed_objects
#54086 merged
Jun 25, 2025 -
[core][refactor] replace unnecessary shared_ptrs with unique_ptrs and references in raylet
#54062 merged
Jun 25, 2025 -
[ci] fix mac ci by pinning cython version
#54061 merged
Jun 25, 2025 -
[core] Deflake
test_basic_3.py
#54083 merged
Jun 25, 2025 -
remove final references to plasma_event_handler
#54085 merged
Jun 25, 2025 -
[core] Deflake
test_ray_init
#54094 merged
Jun 25, 2025 -
[core] Deflake
test_actor_restart
#54087 merged
Jun 25, 2025 -
Updated stalebot to run every 12 hours.
#54041 merged
Jun 25, 2025 -
[serve] Prefer localhost instead of host ip for microbenchmarks
#54092 merged
Jun 25, 2025 -
[train] Driver SIGINT calls controller abort
#53600 merged
Jun 25, 2025 -
[data] Split out long running scaling test
#54045 merged
Jun 25, 2025 -
[core] Deflake
test_actor_unavailable_conn_broken
#54090 merged
Jun 25, 2025 -
[V2][Autoscaler] Fix
numOfHosts
> 1 slice termination logic#54063 merged
Jun 25, 2025 -
[V2][Autoscaler] Add
cloud_instance_id
to all V2 Austoscaler termination requests#53938 merged
Jun 25, 2025 -
Fix autoscaler recovery docker config to use node-specific settings
#53992 merged
Jun 25, 2025 -
[data/preprocessors] Improve execution perf for One Hot encoding
#54022 merged
Jun 25, 2025 -
[Docs][KubeRay] Update changes from KubeRay 1.3.2 to 1.4.0
#53886 merged
Jun 25, 2025 -
[core] Fix comment
#53853 merged
Jun 25, 2025 -
[ci] add
-sSL
for curl on node install#54060 merged
Jun 25, 2025 -
updating compile comment
#54058 merged
Jun 25, 2025 -
Revert "remove extraneous index.rst file for e2e examples (part 2)"
#54051 merged
Jun 25, 2025 -
[data] fix lint error in conftest.py
#54053 merged
Jun 25, 2025 -
[serve] Use
get_application_url
in test_metrics#54050 merged
Jun 24, 2025 -
[ci] update anyscale layer
#54043 merged
Jun 24, 2025 -
[serve.llm] Prefix aware router eviction thread improvements
#53957 merged
Jun 24, 2025 -
[serve] Remove hardcoded urls from serve microbenchmarks
#54026 merged
Jun 24, 2025 -
[core] fix detached actor being unexpectedly killed
#53562 merged
Jun 24, 2025 -
[POC] fix test_metrics
#54037 merged
Jun 24, 2025 -
[serve] Handle request with Semaphore
#54019 merged
Jun 24, 2025 -
remove extraneous index.rst file for e2e examples (part 2)
#54023 merged
Jun 24, 2025 -
[☀️] Fix repr for ray.ObjectRef, ray.ObjectRefGenerator types
#54011 merged
Jun 24, 2025 -
[core][ci] Disable test db for container tests
#54031 merged
Jun 24, 2025 -
[docker] Update latest Docker dependencies for 2.47.1 release
#54016 merged
Jun 23, 2025 -
[core] improve assertion check in test_task_metrics
#53958 merged
Jun 23, 2025 -
remove extraneous index.rst file for e2e-multimodal-ai-workloads
#54017 merged
Jun 23, 2025 -
[Serve.llm] Remove ImageRetriever class and related tests from the LLM deployment module.
#53980 merged
Jun 23, 2025 -
fix test_request_timeout timeout mismatch issue
#54010 merged
Jun 23, 2025 -
fix gsat global
#54012 merged
Jun 23, 2025 -
[train] Fix release test missing data key
#53963 merged
Jun 23, 2025 -
[data] remove schema from release tests
#53956 merged
Jun 23, 2025 -
[kuberay] log actionable err msg when required TPU node selectors missing
#53914 merged
Jun 23, 2025 -
[core] Fix flaky
test_state_api
#53975 merged
Jun 23, 2025 -
[data] remove operator_fusion_benchmark
#53962 merged
Jun 23, 2025 -
[Data] Add reading from Delta Lake tables and from Unity Catalog
#53701 merged
Jun 23, 2025 -
test: refactor
test_observability_helpers
#53875 merged
Jun 23, 2025 -
[core] Remove actor task path in normal task submitter
#53996 merged
Jun 23, 2025 -
[core] Rename
GcsFunctionManager
and use fake in test#53973 merged
Jun 23, 2025 -
[Serve.llm][P/D] Fix health check in prefill disagg
#53937 merged
Jun 22, 2025 -
[Test][KubeRay] Update KubeRay version to v1.4.0 for autoscaler tests
#53974 merged
Jun 22, 2025 -
[core] Fix ActorClass.remote return typing and expose Actor class methods to static analysis
#53986 merged
Jun 21, 2025 -
[core] Use core worker client pool in GCS
#53654 merged
Jun 21, 2025 -
[core] Revert container tests to medium size instance
#53966 merged
Jun 21, 2025 -
Fix ray import error when both ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES are set
#53757 merged
Jun 20, 2025 -
[core] Making NodeManager use ILocalTaskManager instead of TaskManager.
#53961 merged
Jun 20, 2025 -
defer loading csat so gtag loads first
#53968 merged
Jun 20, 2025 -
fix ga4 events
#53967 merged
Jun 20, 2025 -
[train][template] Remove clock emoji which does not always render well
#53965 merged
Jun 20, 2025 -
[core][gpu-objects] Support
ray.get
on the driver process for GPU objects#53902 merged
Jun 19, 2025 -
[kuberay] Update helm install command in prometheus doc to set serviceMonitor
release=prometheus
#53952 merged
Jun 19, 2025 -
[Docs] Fix async code in serving notebook
#53864 merged
Jun 19, 2025 -
[core][rocm] Allow CUDA_VISIBLE_DEVICS and HIP_VISIBLE_DEVICES
#53531 merged
Jun 19, 2025 -
[train][template] Pip install with python block instead
#53928 merged
Jun 19, 2025 -
[Data] Refactor
Planner
to avoid storing plan-specific state#53955 merged
Jun 19, 2025 -
[core] Avoid unnecessary deserialization/serialization of CallerWorkerId
#53939 merged
Jun 19, 2025 -
[serve] add ability to track child requests
#53941 merged
Jun 19, 2025 -
[Doc][KubeRay] Add a doc for scheduler plugins
#53846 merged
Jun 19, 2025 -
[core][telemetry/08] record counter metric e2e
#53449 merged
Jun 19, 2025 -
[HashShuffle] - Add warnings for when there are insufficient resources for Aggregators
#53705 merged
Jun 19, 2025 -
[Data] Join release tests
#53903 merged
Jun 19, 2025 -
[docs][Serve] Add clarification for health check and FT of serve deployments
#53944 merged
Jun 19, 2025 -
fastapi and streaming tests use get applications api
#53949 merged
Jun 19, 2025 -
[RLlib; docs] Fix docstring example for custom MultiRLModule with shared encoder.
#53912 merged
Jun 19, 2025 -
[Data] Prevent filename collisions on write
#53890 merged
Jun 19, 2025 -
[data] fix flakey schema
#53901 merged
Jun 19, 2025 -
[Data] Fixed
BlockMetadata
derivation forRead
operator#53908 merged
Jun 19, 2025 -
[core] Fix flaky
test_worker_exit_intended_user_exit
#53909 merged
Jun 19, 2025 -
fix the bash code run error in notebook
#53900 merged
Jun 19, 2025 -
[Docs] Fix issues with e2e audio tutorial
#53932 merged
Jun 19, 2025 -
[train] Cleanups for training ingest benchmark
#53684 merged
Jun 19, 2025 -
[train] add proper filtering to metrics
#53788 merged
Jun 18, 2025 -
[cgraph] Avoid depending on torch CPU module for CPU-only actor
#53849 merged
Jun 18, 2025 -
[train] expose training input/output in callbacks
#53869 merged
Jun 18, 2025 -
Skip test_metrics_agent_with_open_telemetry on mac
#53917 merged
Jun 18, 2025 -
[Docs] Add ServiceMonitor section and make some step optional in Grafana & Promethus page
#53474 merged
Jun 18, 2025 -
[Docs][KubeRay] Update KubeRay operator installation references for all docs
#53885 merged
Jun 18, 2025 -
[Core] Support AMD GPU MI3xx product line
#51802 merged
Jun 18, 2025 -
[Doc][KubeRay] Update KubeRay operator installation reference
#53842 merged
Jun 18, 2025 -
[Docs][KubeRay] Fix RayJob quickstart doc step 9 error
#53887 merged
Jun 18, 2025 -
[Core] Use fd instead of handle for windows log redirection
#53852 merged
Jun 18, 2025 -
Add dashboard visualizations for TPU metrics
#53898 merged
Jun 18, 2025 -
[ObjectStore] Warn if object store is allocated < 50% of total memory for data workloads
#53857 merged
Jun 18, 2025 -
[Data] Deprecate use_polars flag
#53867 merged
Jun 17, 2025 -
[data] split test_all_to_all.py
#53865 merged
Jun 17, 2025 -
add missing configs for object detection template
#53895 merged
Jun 17, 2025 -
[core] Remove hardcoded flaky tests
#53888 merged
Jun 17, 2025 -
[Serve][LLM] Simplify _prepare_engine_config()
#53704 merged
Jun 17, 2025 -
[core][gpu-objects] Fix
test_gpu_objects_nccl.py
#53874 merged
Jun 17, 2025 -
[RLlib] MetricsLogger: Fix
get/set_state
to handle tensors inself.values
.#53514 merged
Jun 17, 2025 -
[Data] Improve handling of mismatched columns
#53861 merged
Jun 17, 2025 -
Fix pickle error with remote code models in vLLM Ray worker process
#53815 merged
Jun 17, 2025 -
[train][template] Remove ineffective post build script and pip install instead
#53822 merged
Jun 17, 2025 -
[core][gpu objects] Integrate single-controller collective APIs with GPU objects
#53720 merged
Jun 16, 2025 -
[Data] Improve handling of
pandas.NA
#53859 merged
Jun 16, 2025 -
[devx] Fix 'uv run' command line parsing
#53838 merged
Jun 16, 2025 -
[Data] Improve
read_text
trailing newline semantics#53860 merged
Jun 16, 2025 -
[Serve.llm][P/D] Support separate deployment config for PDProxy in Prefill disagg
#53821 merged
Jun 16, 2025 -
[Doc][KubeRay] Remove
vllm-rayservice.md
and use Ray Serve LLM instead#53844 merged
Jun 16, 2025 -
add api to get application url
#53796 merged
Jun 16, 2025 -
[Doc][KubeRay] Remove very old ResNet benchmark example
#53839 merged
Jun 16, 2025 -
[release] Fix release tests
#53855 merged
Jun 16, 2025 -
[Serve.llm] Disable TP=2 VLM batch test
#53825 merged
Jun 16, 2025 -
[Doc][Fix] reveal the falsely hidden export command in the KubeRay GCS FT guide
#53832 merged
Jun 16, 2025 -
[core][gpu-objects] Support intra-process communication
#53798 merged
Jun 16, 2025 -
[Doc][KubeRay] Remove very old XGBoostTrainer example
#53837 merged
Jun 16, 2025 -
[core] Release resources only after tasks have stopped executing
#53660 merged
Jun 16, 2025 -
[core] Deflake
test_multiprocessing.py
#53802 merged
Jun 16, 2025 -
[core] Fix
test_object_spilling.py
on Windows#53851 merged
Jun 16, 2025 -
[KubeRay] Remove unused YAMLs
#53840 merged
Jun 16, 2025 -
[chore] Change file mode of
rayservice-no-ray-serve-replica.md
from 755 to 644#53843 merged
Jun 16, 2025 -
fix
AggregateFnV2
doc to statefinalize
instead of_finalize
#53835 merged
Jun 16, 2025 -
[core] Fix GCS subscribers map race condition
#53781 merged
Jun 16, 2025 -
[core] deleting unused code from plasma client
#53814 merged
Jun 16, 2025 -
[core] Fix race condition in raylet graceful shutdown
#53762 merged
Jun 16, 2025 -
[serve] Revert request timeout from serve instance fixtures
#53809 merged
Jun 16, 2025 -
[Doc] Remove "Deploying a static Ray cluster without KubeRay"
#53833 merged
Jun 15, 2025 -
[Doc] Small mistake in kuberay ingress
#53834 merged
Jun 15, 2025 -
[ci] bazelize
get_contributors
script#53743 merged
Jun 14, 2025 -
[ci] First release test on GKE
#53390 merged
Jun 14, 2025 -
Replace
python setup.py bdist_wheel
withpip wheel
#53458 merged
Jun 14, 2025 -
[serve] Set route_prefix and docs_path when re-deploying app
#53753 merged
Jun 14, 2025 -
[cherry-pick][dashboard] Fix retrieving IP address from the GPUProfilingManager on the dashboard agent
#53817 merged
Jun 14, 2025 -
Add tpu usage metrics to reporter_agent
#53678 merged
Jun 14, 2025 -
[data] Refactor interface for actor_pool_map_operator
#53752 merged
Jun 13, 2025 -
ray-llm container cu124 -> cu128 update
#53730 merged
Jun 13, 2025 -
[dashboard] Fix retrieving IP address from the
GPUProfilingManager
on the dashboard agent#53807 merged
Jun 13, 2025 -
[ci/release] Trigger Ray release by running a Bazel binary
#52962 merged
Jun 13, 2025 -
version change for 2.47.1
#53813 merged
Jun 13, 2025 -
cherrypick #53671
#53812 merged
Jun 13, 2025 -
[core] Move dependencies of NodeManger to main.cc for better testability
#53782 merged
Jun 13, 2025 -
[core] Deflake
test_object_spilling.py
#53803 merged
Jun 13, 2025 -
[core] Deflake
test_state_api.py
#53804 merged
Jun 13, 2025 -
[tune] update BlockMetadata args in tests
#53791 merged
Jun 13, 2025 -
[serve] Fix autoscaling metrics
#53778 merged
Jun 13, 2025 -
pass route prefix to replica
#53777 merged
Jun 13, 2025 -
[Serve] Call shared long poll client router registration in event loop
#53613 merged
Jun 13, 2025 -
[core] Add timeout to
ray.get
call intest_update_object_location_batch_failure
#53805 merged
Jun 13, 2025 -
[RLlib] Fix device check in
Learner
.#53706 merged
Jun 13, 2025 -
[core] Deflake
test_client_builder.py
#53774 merged
Jun 13, 2025 -
[core] Increase instance sizes for wheel / HA tests
#53783 merged
Jun 13, 2025 -
[serve.llm] Organize spread out utils.py
#53722 merged
Jun 13, 2025 -
[Doc] Added ray-serve llm doc
#52832 merged
Jun 12, 2025 -
Remove Schema From BlockMetadata
#53454 merged
Jun 12, 2025 -
[Core] Exit the Core Worker Early Error Received from Plasma Store
#53679 merged
Jun 12, 2025 -
fix: WandbLogger crashing silently on a FileNotFoundError
#50308 merged
Jun 12, 2025 -
[Serve] feat: make ray.serve.batch concurrent
#53096 merged
Jun 12, 2025 -
[RLlib] Add ability to compute percentiles to MetricsLogger/Stats
#52963 merged
Jun 12, 2025 -
[core][telemetry] move the open telemetry tests into a pytest module
#53751 merged
Jun 12, 2025 -
[core] Speed up
test_actor_advanced.py
#53738 merged
Jun 12, 2025 -
Replace
miniconda3
withminiforge
#53436 merged
Jun 12, 2025 -
[serve] Improve test_metrics
#53747 merged
Jun 12, 2025 -
update codeowners for ray serve
#53717 merged
Jun 11, 2025 -
[Tune][Air] Fix MLflowLoggerCallback to enable its use with PBT (#27783)
#42182 merged
Jun 11, 2025 -
Fix vLLM batch test by changing to Pixtral
#53744 merged
Jun 11, 2025 -
Fix uv tests on macos x86_64
#53741 merged
Jun 11, 2025 -
[core][telemetry/07] support counter metric on worker side
#53418 merged
Jun 11, 2025 -
[docker] Update latest Docker dependencies for 2.47.0 release
#53749 merged
Jun 11, 2025 -
[docker] Update latest Docker dependencies for 2.47.0 release
#53748 merged
Jun 11, 2025 -
[train] Raise error when calling ray.train.report with a gpu tensor
#53725 merged
Jun 11, 2025 -
[serve] Increase httpx timeout to 30s for backpressure test
#53711 merged
Jun 11, 2025 -
[ci] Move
simulate_storage
from_private/
to_common/
#53735 merged
Jun 11, 2025 -
[core] Give io context concurrency hint
#53642 merged
Jun 11, 2025 -
[serve] Remove dependency on
ray._private.ray_constants.py
#53700 merged
Jun 11, 2025 -
[core] Warning when creating actor with restarts and arguments in plasma
#53713 merged
Jun 11, 2025 -
[Docs] [istio mtls] Add warning on sidecar OOM for mTLS
#53385 merged
Jun 11, 2025 -
[cherry-pick][Docs] Added user-guide for Joins (#52987)
#53712 merged
Jun 11, 2025 -
Add observability for label-selectors
#53423 merged
Jun 11, 2025 -
[Doc] Bind the version of kuberay to v1.3.0 in related docs
#53661 merged
Jun 11, 2025 -
[docs] fix link in gcp-gke-tpu-cluster.md
#53708 merged
Jun 11, 2025 -
Add perf metrics for 2.47.0
#53668 merged
Jun 11, 2025 -
[train][template] pytorch + train + data template uses absolute links
#53718 merged
Jun 11, 2025 -
[train] add trace to WorkerHealthCheckFailedError
#53626 merged
Jun 11, 2025 -
fix CI, wrong import path
#53715 merged
Jun 10, 2025 -
[core][gpu-objects] Fix the performance regression by clearing
object_ref
for small and non-GPU objects#53692 merged
Jun 10, 2025 -
[Docs] Finalize time-series tutorial, add lockfiles
#53710 merged
Jun 10, 2025 -
[core] remove dead open telemetry code
#53709 merged
Jun 10, 2025 -
E2e rag
#53703 merged
Jun 10, 2025 -
[core] Add single-controller API for ray.util.collective and torch gloo backend
#53319 merged
Jun 10, 2025 -
[core] Migrate ray.private.pydantic_compat from _private to _common
#53686 merged
Jun 10, 2025 -
[core][3/N] Avoid unnecessary deserialization/serialization of ParentTaskId
#53695 merged
Jun 10, 2025 -
[core] Remove deprecated
storage
parameter toray.init
#53669 merged
Jun 10, 2025 -
[serve.llm] delete dead code from prompt format days
#53621 merged
Jun 10, 2025 -
[core] Fix
test_multi_tenancy.py
on Windows#53699 merged
Jun 10, 2025 -
[core] Remove unused
object_ref_seed
parameter#53698 merged
Jun 10, 2025 -
[core] early exit spill if spilling config is empty
#53193 merged
Jun 10, 2025 -
[ci] Fix crane auth issue for nightly multi arch tagging
#53483 merged
Jun 10, 2025 -
handle task cancellation error
#53680 merged
Jun 10, 2025 -
Code refactoring in proxy
#53644 merged
Jun 10, 2025 -
[core] Migrate wait_for_condition and async_wait_for_condition from _private to _common
#53652 merged
Jun 10, 2025 -
Convert cluster compute config in release test to Kuberay compute config
#53681 merged
Jun 10, 2025 -
[Core] Vendor setproctitle
#53471 merged
Jun 10, 2025 -
add back run on anyscale button
#53688 merged
Jun 10, 2025 -
[Compiled Graph] Enhance Compile Graph with Multi-Device Support
#53395 merged
Jun 10, 2025 -
BLD: Remove redundant
manylinux1
related flag in.bazelrc
#53549 merged
Jun 10, 2025 -
[train][template] Add Anyscale template for pytorch + train + data
#53220 merged
Jun 10, 2025 -
[core] Remove deprecated
ray start
CLI options#53675 merged
Jun 10, 2025 -
[core] Speed up & deflake
test_multitenancy.py
#53674 merged
Jun 10, 2025 -
[ci] change macos intel platform to 12_0
#53671 merged
Jun 10, 2025 -
[Docs] Create lockfiles for various e2e tutorials
#53672 merged
Jun 10, 2025 -
[Docs] Adds second notebook to timeseries tutorial
#53561 merged
Jun 10, 2025 -
[ci] fix misconfig on byod scripts
#53682 merged
Jun 10, 2025 -
Fix map_batches release test back_to_back option
#53664 merged
Jun 9, 2025 -
[ci] Resize some runtime_env tests
#53670 merged
Jun 9, 2025 -
[core] Creating an interface ObjectManager's for GrpcClientManager.
#53656 merged
Jun 9, 2025 -
[core][1/N] Avoid unnecessary deserialization/serialization of TaskId
#53577 merged
Jun 9, 2025 -
make constant for x-request-id
#53667 merged
Jun 9, 2025 -
Remove
ray.workflow
package#53612 merged
Jun 9, 2025 -
add vale to pre-commit
#53564 merged
Jun 9, 2025 -
Add DataContext + LogicalOp Args to Dataset Export
#53554 merged
Jun 9, 2025 -
[core] Skip test on mac build
#53662 merged
Jun 9, 2025 -
[core] Make preloading Jemalloc configurable for worker
#47243 merged
Jun 9, 2025 -
[llm] bump vllm to 0.9.0.1
#53443 merged
Jun 9, 2025 -
uint8_t* data ptr not used.
#47565 merged
Jun 9, 2025 -
[core][2/N] Avoid unnecessary deserialization/serialization of ObjectId
#53574 merged
Jun 9, 2025 -
[ci] add more docker groups to work with buildkite amis
#53640 merged
Jun 9, 2025 -
[ci] use new docker account for releasing
#53646 merged
Jun 8, 2025 -
[core] Correctly fail worker lease request if a task becomes infeasible after scheduling
#52295 merged
Jun 8, 2025 -
pin flashinfer-python to 0.2.5
#53637 merged
Jun 7, 2025 -
[core] Skip
test_output.py::test_autoscaler_v2_stream_events_with_filter
on Windows#53628 merged
Jun 7, 2025 -
[ci] use new docker account for releasing
#53629 merged
Jun 7, 2025 -
Enhance map_batches release tests
#53627 merged
Jun 6, 2025 -
remove CTA for README.ipynb
#53618 merged
Jun 6, 2025 -
[Docs] Combine audio curation tutorial into a single notebook
#53589 merged
Jun 6, 2025 -
skip failing wheel test
#53624 merged
Jun 6, 2025 -
Add Image ID and Size Parameters to Azure node provider
#53298 merged
Jun 6, 2025 -
[core] Shorten test name too long on Windows
#53616 merged
Jun 6, 2025 -
Revert "[core][gpu-objects] GPU Objects POC (#52938)"
#53602 merged
Jun 6, 2025 -
[core] Speed up
test_placement_group_5.py
#53611 merged
Jun 6, 2025 -
[core] Remove errant comment
#53614 merged
Jun 6, 2025 -
[core] Fix Windows failures for
test_output.py
#53609 merged
Jun 6, 2025 -
[Serve][Doc] Add custom request router docs
#53511 merged
Jun 6, 2025 -
[core] Move runtime_env Ray client tests into the dedicated build
#53480 merged
Jun 6, 2025 -
[Data] Remove
num_free_slots
in favor ofnum_free_task_slots
#53555 merged
Jun 6, 2025 -
[core] Move some utils to
ray._common.utils
to replaceray._private.utils
usage in libraries#53287 merged
Jun 6, 2025 -
[core][telemetry/06] record gauge metric e2e
#53231 merged
Jun 6, 2025 -
Fix python executable path for ray nsight configuration
#53598 merged
Jun 6, 2025 -
updating dead links
#53573 merged
Jun 5, 2025 -
Revert "[Core] Upgrade vendored setproctitle to 1.3.6 (#53544)"
#53593 merged
Jun 5, 2025 -
[Docs] Fix xgboost anyscale job submission
#53565 merged
Jun 5, 2025 -
[Docs] Add e2e timeseries example to examples and release tests
#53488 merged
Jun 5, 2025 -
[core][telemetry] fix tsan issues
#53559 merged
Jun 5, 2025 -
Fix broken link in documentation
#53547 merged
Jun 5, 2025 -
[core] Skip
test_reference_counting_2.py::test_recursively_pass_returned_object_ref
on Windows#53587 merged
Jun 5, 2025 -
[core] Minor speedups in
test_failure.py
#53585 merged
Jun 5, 2025 -
[core] Speed up and deflake
test_output.py
#53584 merged
Jun 5, 2025 -
make $100 banner dismissable
#53527 merged
Jun 5, 2025 -
[core] refactor: consolidate
_common
test files under_common/tests
directory#53543 merged
Jun 5, 2025 -
[ci] Add
spark_on_ray
tag to_ALL_TAGS
set used in determining tests to run#53579 merged
Jun 5, 2025 -
[ci] allow adding byod post install script without CI team approval
#53537 merged
Jun 5, 2025 -
[Core] Upgrade vendored setproctitle to 1.3.6
#53544 merged
Jun 4, 2025 -
[core] Refactor actor method binding with
ActorMethodShell
and remove weakrefs#53178 merged
Jun 4, 2025 -
[Serve] add usage telemetry for custom request router
#53541 merged
Jun 4, 2025 -
[core] Update very old doc message
#53557 merged
Jun 4, 2025 -
[CI] Re-enable isort for python/ray/tune
#52733 merged
Jun 4, 2025 -
Ray serve/lora doc fix
#53553 merged
Jun 4, 2025 -
Replace
miniconda3
withminiforge
for linux#53528 merged
Jun 4, 2025 -
[core][telemetry/05] refactor open_telemetry_metric_recoder.py
#53380 merged
Jun 4, 2025 -
[core][telemetry] fix windows build
#53540 merged
Jun 4, 2025 -
[Core] Deflake test_tempfile.py
#53451 merged
Jun 4, 2025 -
[core] Revert error log when gcs kills actors
#53534 merged
Jun 4, 2025 -
[train] add hardware metrics to grafana
#53218 merged
Jun 3, 2025 -
add object detection notebooks
#50965 merged
Jun 3, 2025 -
[core] Deflake test_gcs_connection_no_leak
#53526 merged
Jun 3, 2025 -
[core] Remove
ray.experimental.packaging
#53524 merged
Jun 3, 2025 -
[Serve] Implement recording routing stats periodically
#53355 merged
Jun 3, 2025 -
[core][telemetry/04] register and record metrics on worker side
#53209 merged
Jun 3, 2025 -
[Serve] Add helpers to rank replicas with multiplex and locality
#51890 merged
Jun 3, 2025 -
[core] Fix
test_runtime_env.py
in CI#53517 merged
Jun 3, 2025 -
[RLlib] Resolve numerical instabilities in
MeanStdFilter
.#53484 merged
Jun 3, 2025 -
[serve.llm][test] add type annotation for probes/models.py
#53340 merged
Jun 3, 2025 -
[core] Fix windows conda activate with conda.bat as executable in conda path
#40779 merged
Jun 3, 2025 -
[core] Handle node death in object manager
#53397 merged
Jun 3, 2025 -
[min test] install pytest more generally
#53507 merged
Jun 3, 2025 -
Fix iter_rows involving _get_max_chunk_size
#53495 merged
Jun 3, 2025 -
update version to 2.47.0
#53494 merged
Jun 3, 2025 -
[Data] Remove assertion that actors with infinite retries can't be dead
#53491 merged
Jun 2, 2025 -
[docs] Add antipattern for nested ray.get
#43184 merged
Jun 2, 2025 -
Fix iter_rows involving _get_max_chunk_size
#53487 merged
Jun 2, 2025 -
Fix lingering socket issue in tests by replacing requests with httpx
#53434 merged
Jun 2, 2025 -
[core] Use shared cluster fixture for
test_runtime_env.py
#53479 merged
Jun 2, 2025 -
[Doc][KuberRay] add doc for kuberay with uv
#53303 merged
Jun 2, 2025 -
added e2e multimodal ai workloads example
#53415 merged
Jun 2, 2025 -
[Docs] Fixes for xgboost Jobs and Services
#53455 merged
Jun 2, 2025 -
move signal and semaphore to
_common/
#53457 merged
Jun 2, 2025 -
Update chaos testing utility to simulate grace period
#53425 merged
Jun 2, 2025 -
[core] Split out
runtime_env
unit tests#53477 merged
Jun 2, 2025 -
[docs][kuberay] update autoscaling guide
#53403 merged
Jun 2, 2025 -
[core] Pin github URI in runtime_env tests
#53476 merged
Jun 2, 2025 -
[tune] relax test_experiment_restore timeout
#53387 merged
Jun 2, 2025 -
consume autoscaling metrics from both handle and replica
#53453 merged
Jun 2, 2025 -
[Data] Update batch inference release test to use CPUs
#53441 merged
Jun 2, 2025 -
[core] test_advanced_9 speedup
#53305 merged
Jun 2, 2025 -
[core] Remove unused
on_worker_shutdown
callback from the core worker#53389 merged
Jun 2, 2025 -
setup-dev.py: fix invalid '-r' option in mv command
#53196 merged
Jun 2, 2025 -
[Docs] Update Grafana docs about auto-load JSON
#53256 merged
Jun 2, 2025 -
[Serve] update input arg for
choose_replicas
to be a list of candidate replicas#53456 merged
May 31, 2025 -
Increased stale bot GA operations per run, and fixed a typo.
#53446 merged
May 31, 2025 -
[Docs] Fixes to e2e audio and xgboost notebooks for Anyscale Templates
#53452 merged
May 30, 2025 -
Reduce default Arrow Block iter_rows batch size
#53413 merged
May 30, 2025 -
Revert "Revert "[core][telemetry/03] add OpenTelemetryMetricRecorder c++ client""
#53422 merged
May 30, 2025 -
[Data] Fixing flaky aggregation test
#53383 merged
May 30, 2025 -
[ci] fix index generation
#53440 merged
May 30, 2025 -
[CI] Re-enable isort for python/ray/llm/
#53437 merged
May 30, 2025 -
[Core] Add bundle_label_selector scheduling logic
#52988 merged
May 30, 2025 -
Revert "[data] Make vectorizer, encoder and imputer preprocessor use …
#53444 merged
May 30, 2025 -
pass ingress flag into replica
#53428 merged
May 30, 2025 -
[core] Close FD in
test_label_utils.py
fixture#53439 merged
May 30, 2025 -
[Serve] Add helper to filter out busy replicas
#53433 merged
May 30, 2025 -
[core] Allow custom grpc channel arguments
#53416 merged
May 30, 2025 -
[deps] remove opencensus python exporter
#53392 merged
May 30, 2025 -
[core] Revert KillActor rpc failure error log
#53412 merged
May 30, 2025 -
[serve.llm] Set VLLM_NIXL_SIDE_CHANNEL_HOST for PD case
#53432 merged
May 30, 2025 -
[CI] Re-enable isort for python/ray/llm/_internal/
#53429 merged
May 30, 2025 -
[ci] Add google-cloud-logging to list of buildkite dependencies
#53426 merged
May 30, 2025 -
[Docs] Adds compute configs for Anyscale workspaces for e2e xgboost & e2e audio
#53424 merged
May 30, 2025 -
Fix torchrec release test collate function
#53420 merged
May 30, 2025 -
[train] fix TensorflowTrainer docstring
#53394 merged
May 29, 2025 -
swap to big anyscale button
#53279 merged
May 29, 2025 -
[ci] Create indexes for nightly images
#52816 merged
May 29, 2025 -
Updated GitHub PR stale workflow to exclude issues entirely.
#53414 merged
May 29, 2025 -
[core][gpu-objects] GPU Objects POC
#52938 merged
May 29, 2025 -
Revert "[core][telemetry/03] add OpenTelemetryMetricRecorder c++ client"
#53421 merged
May 29, 2025 -
[core] Bump timeout in
test_task_events_2.py
#53409 merged
May 29, 2025 -
[Serve] Add
on_request_routed
callback ontoRequestRouter
#53272 merged
May 29, 2025 -
[core] Speed up
test_task_events_3.py
#53404 merged
May 29, 2025 -
[ci] fix copy_files in mac ci script
#53406 merged
May 29, 2025 -
[CI] Re-enable isort for python/ray/dashboard/
#53363 merged
May 29, 2025 -
[data/preprocessors] Add flatten functionality to Concatenator
#53378 merged
May 29, 2025 -
[core][telemetry/03] add OpenTelemetryMetricRecorder c++ client
#53159 merged
May 29, 2025 -
[core] Skip
test_storage
on Windows#53399 merged
May 29, 2025 -
[core] Increase timeout for ref counting test
#53405 merged
May 29, 2025 -
[core] Skip graceful shutdown test on Windows
#53400 merged
May 29, 2025 -
[core] Skip cgroup privileged test on Windows builds
#53401 merged
May 29, 2025 -
[core] Fix
test_label_utils.py
on windows (again)#53388 merged
May 29, 2025 -
[core][compiled graphs] Unify scheduling for NCCL operation nodes
#53111 merged
May 29, 2025 -
[ci] migrate all copy_files call into bazel run
#53359 merged
May 28, 2025 -
[rllib] removes tf2onnx from rllib dependencies
#53339 merged
May 28, 2025 -
[serve.llm] release tests for 1p1d
#53190 merged
May 28, 2025 -
[ci] Validate BYOD type and matching Python version for release tests
#53219 merged
May 28, 2025 -
[ray-llm] Update ray-llm docker to install UCX/NIXL
#53377 merged
May 28, 2025 -
[core][telemetry/02-bis] update enable_open_telemetry flag
#53379 merged
May 28, 2025 -
[data] upgrade raydp
#53350 merged
May 28, 2025 -
[train] Log controller state transitions
#53344 merged
May 28, 2025
176 Pull requests opened by 81 people
-
[serve.llm] [cleanup] Add LLMConfig.parse_from() api
#53382 opened
May 28, 2025 -
Deduplicate schema in BlockMetadata
#53384 opened
May 28, 2025 -
[serve.llm] DO NOT REVIEW, IN DRAFT
#53391 opened
May 29, 2025 -
[WIP] [core] Attempting a basic solution to streaming generator not adding errors to plasma
#53393 opened
May 29, 2025 -
[Core] Add Logic to Emit Task Events to Event Aggregator
#53402 opened
May 29, 2025 -
[WIP] Remove `_owner` arg for `ray.put`
#53410 opened
May 29, 2025 -
kuberay edits
#53411 opened
May 29, 2025 -
[Data] Add support for ray.dataset.map_sql
#53417 opened
May 29, 2025 -
[Data] add switch for optimizer rules
#53427 opened
May 30, 2025 -
try running things with protobuf 4
#53442 opened
May 30, 2025 -
feat: Add QPS-based autoscaling policy for Ray Serve
#53445 opened
May 30, 2025 -
Bump torch from 2.0.1 to 2.7.0 in /doc/source/templates/testing/docker/03_serving_stable_diffusion
#53447 opened
May 30, 2025 -
[WIP][Data] Add support for Arrow native fixed-shape tensor type
#53450 opened
May 30, 2025 -
[Data] Add fillna function
#53459 opened
May 31, 2025 -
[Data] Added distinct function
#53460 opened
May 31, 2025 -
[core] Add as_completed and map_unordered APIs
#53461 opened
May 31, 2025 -
[core] Check if a task can be spilled before checking if args can be pinned
#53462 opened
May 31, 2025 -
[Serve] Set the docs path after app is initialized on the replica
#53463 opened
Jun 1, 2025 -
[Data] Add dropna function
#53464 opened
Jun 1, 2025 -
[core][compiled graphs] Unify and simplify NCCL operation nodes
#53470 opened
Jun 2, 2025 -
docs test coverage script
#53482 opened
Jun 2, 2025 -
[WIP][Data] Batch query for block_ref_iter
#53485 opened
Jun 2, 2025 -
[Data] Add a data compaction function
#53489 opened
Jun 2, 2025 -
[Dashboard] Fixing residual state leaks in Dashboard/Agent
#53508 opened
Jun 3, 2025 -
[core][telemetry/09] record sum metric e2e
#53512 opened
Jun 3, 2025 -
[RLlib] Wrapper which allows EnvRunners to operate on environments with Repeated observation spaces
#53519 opened
Jun 3, 2025 -
[core] Turn executed task inserted into a RAY_CHECK
#53522 opened
Jun 3, 2025 -
[serve.llm] Update ray-llm docker
#53532 opened
Jun 3, 2025 -
[data] add Lance-based ordered data conversion that keeps row_id content unchanged
#53542 opened
Jun 4, 2025 -
[RLlib] Upgrade RLlink protocol for external env/simulator training.
#53550 opened
Jun 4, 2025 -
[core] Support pip_install_options for pip
#53551 opened
Jun 4, 2025 -
Script to generate test coverage for doc files
#53556 opened
Jun 4, 2025 -
Bump torch from 2.3.0 to 2.7.1 in /python
#53558 opened
Jun 4, 2025 -
[core] Cleanup gcs event listeners and gcs_storage env variable
#53566 opened
Jun 4, 2025 -
[Data] [Draft] user guide for aggregations
#53568 opened
Jun 4, 2025 -
[DON'T MERGE]
#53575 opened
Jun 5, 2025 -
[Not for Merge] Event Aggregator Perf
#53576 opened
Jun 5, 2025 -
Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API
#53578 opened
Jun 5, 2025 -
[CI] Re-enable isort for all remaining files
#53583 opened
Jun 5, 2025 -
BLD: Automatically patch ``.bazelrc`` file for Windows 11 build
#53586 opened
Jun 5, 2025 -
[Do not merge] Run ray data release tests with export API
#53594 opened
Jun 5, 2025 -
[core] Cleanup retryable grpc client
#53599 opened
Jun 6, 2025 -
[serve.llm] Add useful logging in prefill_decode_disagg.py
#53604 opened
Jun 6, 2025 -
Fix 53605
#53607 opened
Jun 6, 2025 -
[core] Remove experimental `max_cpu_frac_per_node`
#53610 opened
Jun 6, 2025 -
[rllib] IMPALA fix no attribute '_minibatch_size'
#53620 opened
Jun 6, 2025 -
[core] Support broadcast and reduce collective for compiled graphs
#53625 opened
Jun 6, 2025 -
[core] Gcs actor manager cleanup
#53633 opened
Jun 7, 2025 -
[core] Fix gcs register actor callback check
#53634 opened
Jun 7, 2025 -
[Air] Add Video FPS Support for `WandbLoggerCallback`
#53638 opened
Jun 7, 2025 -
[Serve] Check multiple FastAPI ingress deployments in a single application
#53647 opened
Jun 8, 2025 -
[core]: Correct podman output parsing for image uri in runtime env
#53653 opened
Jun 9, 2025 -
[core] Adding a nightly benchmark for continuous, bidirectional object transfer on two nodes.
#53657 opened
Jun 9, 2025 -
[refactor] Install uv from test-requirements.txt
#53685 opened
Jun 10, 2025 -
[data] allow max_calls to be a static but not dynamic option
#53687 opened
Jun 10, 2025 -
[WIP] Remove old uv runtime env plugin
#53690 opened
Jun 10, 2025 -
Bump requests from 2.32.3 to 2.32.4 in /python
#53691 opened
Jun 10, 2025 -
[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling.
#53702 opened
Jun 10, 2025 -
[serve.llm] Refactor/Consolidate LoRA downloading
#53714 opened
Jun 10, 2025 -
(serve.llm) Make _LLMServerBase.__init__ synchronous
#53719 opened
Jun 10, 2025 -
Bump scikit-learn from 1.3.2 to 1.5.1 in /doc/source/ray-overview/examples/e2e-timeseries
#53721 opened
Jun 10, 2025 -
[serve.llm] Add better logging verbosity controls
#53728 opened
Jun 11, 2025 -
Minor Documentation Fixes in Protobuf Files
#53731 opened
Jun 11, 2025 -
[RLlib; docs] Docs do-over (new API stack): `ConnectorV2` documentation.
#53732 opened
Jun 11, 2025 -
[WIP] Remove test cases for `gcs_actor_based_scheduling`
#53733 opened
Jun 11, 2025 -
[core][telemetry/10] support custom gauge+counter+sum metrics
#53734 opened
Jun 11, 2025 -
[core][telemetry/11] support histogram metric on worker side
#53740 opened
Jun 11, 2025 -
[core] upgrade opentelemetry-sdk
#53745 opened
Jun 11, 2025 -
Test
#53746 opened
Jun 11, 2025 -
Add example gpt2 tuning script
#53750 opened
Jun 11, 2025 -
[core] Add switch for the cache of runtime env
#53775 opened
Jun 12, 2025 -
[serve] Add telemetry for users with Pydantic version < 2
#53779 opened
Jun 12, 2025 -
Add `pin_memory` to `iter_torch_batches`
#53792 opened
Jun 13, 2025 -
[train] TrainStateActor periodically checks controller status and sets aborted
#53818 opened
Jun 13, 2025 -
Bump gitpython from 3.1.40 to 3.1.41 in /python
#53819 opened
Jun 13, 2025 -
Bump tqdm from 4.64.1 to 4.66.3 in /python
#53820 opened
Jun 13, 2025 -
Sharing progress with broader team
#53823 opened
Jun 14, 2025 -
[Doc][KubeRay] remove head pod trailing hash and adjust volcano output
#53826 opened
Jun 14, 2025 -
[core] Ungracefully exit if the agent dies unexpectedly
#53847 opened
Jun 16, 2025 -
[core] adding additional stats to the dump object store usage api.
#53856 opened
Jun 16, 2025 -
[core] Cleanup naming in core worker scheduling queues
#53858 opened
Jun 16, 2025 -
[core] Sleep to debug container test
#53862 opened
Jun 16, 2025 -
[core] Don't queue in flight submissions by attempt number
#53866 opened
Jun 16, 2025 -
Feat/ray serve middleware support
#53868 opened
Jun 17, 2025 -
Pass parameters to custom routers through LLMConfig
#53870 opened
Jun 17, 2025 -
[dashboard] Support to overwrite the _client_max_size of http request entity
#53880 opened
Jun 17, 2025 -
[doc][core] fix reStructuredText formatting on Resources page
#53882 opened
Jun 17, 2025 -
[Docs][KubeRay] Update all KubeRay version references for KubeRay 1.4.0 release
#53884 opened
Jun 17, 2025 -
[ci] add python 3.13 ray docker image build
#53894 opened
Jun 17, 2025 -
Bump gradio from 3.50.2 to 5.31.0 in /python/requirements
#53899 opened
Jun 17, 2025 -
python depsets tool
#53904 opened
Jun 18, 2025 -
[core] Move inner_publisher logic into gcsPublisher
#53905 opened
Jun 18, 2025 -
[WIP][core][gpu-objects] GC
#53911 opened
Jun 18, 2025 -
[RLlib] Add missing colon to CUBLAS_WORKSPACE_CONFIG
#53913 opened
Jun 18, 2025 -
[RLlib] Add missing documentation for SACConfig's training()
#53918 opened
Jun 18, 2025 -
[core][telemetry/12] record histogram metric e2e
#53927 opened
Jun 18, 2025 -
Update deletion policy for rayjob quick start
#53929 opened
Jun 18, 2025 -
[Data] - write_parquet enable both partition by & min_rows_per_file, max_rows_per_file
#53930 opened
Jun 18, 2025 -
[core][telemetry/13] performance tests
#53931 opened
Jun 18, 2025 -
[serve] move test from test_grpc to test_proxy
#53933 opened
Jun 18, 2025 -
[core] Fix race condition b/w object eviction & repinning for recovery.
#53934 opened
Jun 18, 2025 -
[core][GPU objects] Attach tensor transport to task args protobuf
#53935 opened
Jun 18, 2025 -
Bump urllib3 from 1.26.19 to 2.5.0 in /python
#53936 opened
Jun 18, 2025 -
[Data] Replaced `get_object_locations` with `get_local_object_locations`
#53942 opened
Jun 19, 2025 -
[doc][kuberay] state `rayStartParams` is optional starting with KubeRay 1.4.0
#53943 opened
Jun 19, 2025 -
[doc][kuberay] add version skew warning for plugin and RayCluster
#53950 opened
Jun 19, 2025 -
Can wins01
#53959 opened
Jun 19, 2025 -
finishing commit for issue #52113
#53964 opened
Jun 19, 2025 -
tune: make Tune status/progress tables readable in dark mode
#53969 opened
Jun 20, 2025 -
docs(data): fix broken Parameters table
#53972 opened
Jun 20, 2025 -
Feature/sac discrete
#53982 opened
Jun 20, 2025 -
[CI][KubeRay] Update KubeRay CI Tests branch for KubeRay v1.4.0 release
#53984 opened
Jun 21, 2025 -
[Core] Add AcceleratorManager implementation for Rebellions NPU
#53985 opened
Jun 21, 2025 -
[Doc] Update Istio service mesh graph
#53988 opened
Jun 21, 2025 -
[Serve] Make replica scheduler backoff configurable #52871
#53991 opened
Jun 21, 2025 -
[core] Recover intermediate objects if needed while generator running
#53999 opened
Jun 22, 2025 -
[ci][core] Fix timeouts in `test_scheduling` when run in debug mode
#54003 opened
Jun 23, 2025 -
Fixes default_dqn_torch_rl_module assuming the device is 'cpu'
#54004 opened
Jun 23, 2025 -
Added openssl support for PPC64LE.
#54006 opened
Jun 23, 2025 -
[dashboard] Clean up naming for GPU profiling module
#54009 opened
Jun 23, 2025 -
[docker] Update latest Docker dependencies for 2.47.1 release
#54015 opened
Jun 23, 2025 -
[core] test out wait_for_condition exceptions
#54018 opened
Jun 23, 2025 -
[DONOTMERGE] Proof-of-concept for GPU objects + NIXL
#54024 opened
Jun 24, 2025 -
Bump mlflow from 2.19.0 to 3.1.0 in /doc/source/ray-overview/examples/e2e-xgboost
#54027 opened
Jun 24, 2025 -
Multimodal ai
#54029 opened
Jun 24, 2025 -
[core][autoscaler][v1] add heartbeat timeout logic to determine node activity status
#54030 opened
Jun 24, 2025 -
Bump mlflow from 2.22.0 to 3.1.0 in /python
#54032 opened
Jun 24, 2025 -
[core] Delete asyncio actor logic in in-order scheduling code
#54033 opened
Jun 24, 2025 -
[core] Don't order retries at all for in-order actors
#54034 opened
Jun 24, 2025 -
gen test
#54046 opened
Jun 24, 2025 -
update all 'Run on Anyscale' buttons to redirect to respective template preview pages
#54049 opened
Jun 24, 2025 -
Add Azure Files support to persistent storage documentation
#54055 opened
Jun 24, 2025 -
[train] Add broadcast_from_rank_zero and barrier collectives
#54066 opened
Jun 25, 2025 -
[core][refactor] move NodeManager::KillWorker to WorkerInterface::Kill for better testability
#54068 opened
Jun 25, 2025 -
[RLlib] Fix env runners not being marked healthy if there is no local env runner
#54071 opened
Jun 25, 2025 -
[RLlib] Bug fix: Failed EnvRunners are not restored if there is no local EnvRunner.
#54091 opened
Jun 25, 2025 -
Adapt to vLLM reducing exports from the top level
#54099 opened
Jun 25, 2025 -
Adapt Dask on Ray to the new Dask Task class
#54108 opened
Jun 25, 2025 -
[data] Remove asserts that test internal `ds._block_num_rows()`
#54109 opened
Jun 25, 2025 -
vLLM ZMQ KVEvent Router
#54115 opened
Jun 25, 2025 -
[core] Fix "Check failed: it->second.num_retries_left == -1"
#54116 opened
Jun 25, 2025 -
[core][cgraph] Export classes related to NCCL communicator
#54117 opened
Jun 26, 2025 -
[serve] Remove usage of `ray._private.state`
#54140 opened
Jun 26, 2025 -
[core] fix checking for uv existence during ray_runtime setup
#54141 opened
Jun 26, 2025 -
[RLlib] Fix checkpoints not having correct num_env_steps_sampled_lifetime
#54148 opened
Jun 26, 2025 -
[core] Deflake `test_spread_scheduling_overrides_locality_aware_scheduling`
#54154 opened
Jun 26, 2025 -
[data] Add timeout for `test_arrow_block_scaling.py`
#54155 opened
Jun 26, 2025 -
[Data] Fix examples in some Data user guides
#54158 opened
Jun 27, 2025 -
[test] fix test not ending cluster; spelling mistake: tearDow -> tearDown
#54171 opened
Jun 27, 2025 -
[core] Add static type hints for Actor methods
#54173 opened
Jun 27, 2025 -
[Feat][Core] Don't count actor restarts due to node preemption towards max_restarts
#54175 opened
Jun 27, 2025 -
[core] Add debug prints to `test_scheduling.py::test_hybrid_policy`
#54176 opened
Jun 27, 2025 -
[serve] Move logic into user callable wrapper
#54177 opened
Jun 27, 2025 -
[Core] Use Factory method to create gcs KV Manager
#54178 opened
Jun 27, 2025 -
[core] Fix "it != submissible_tasks_.end() Tried to retry task"
#54179 opened
Jun 27, 2025 -
move Collector class to _common
#54180 opened
Jun 27, 2025 -
[Feat][Core] Don't count task retries due to node preemption
#54182 opened
Jun 27, 2025 -
[core][test] deflaky test_demand_report_when_scale_up by reducing workloads
#54183 opened
Jun 27, 2025 -
[serve] take scope, receive, send for call http entrypoint
#54184 opened
Jun 27, 2025 -
[RLlib] - Increased default timesteps on two experiments.
#54185 opened
Jun 27, 2025 -
Token-split prefix router
#54187 opened
Jun 27, 2025 -
[train] Force abort on SIGINT spam and do not abort finished runs
#54188 opened
Jun 28, 2025 -
[Serve.llm][Prototype][WIP] Simplify LLMServer and inherit OpenAIServingChat behavior
#54189 opened
Jun 28, 2025 -
[Data] Limit async UDF production queue max-size
#54190 opened
Jun 28, 2025 -
[Doc][KubeRay] Kuberay gcs ft takes yaml file with version 1.4.0
#54192 opened
Jun 28, 2025 -
[data] allow custom batcher for dataset iteration
#54193 opened
Jun 28, 2025 -
[data.llm] Add release test to capture memory leak
#54194 opened
Jun 28, 2025 -
[data] run dask tests seperately
#54195 opened
Jun 28, 2025 -
[data.llm][Bugfix] Respect tuple `concurrency` config
#54196 opened
Jun 28, 2025 -
[ci] unify macos build script across platforms
#54198 opened
Jun 28, 2025
207 Issues closed by 57 people
-
[Epic][Docs/KubeRay] Convert doctests back to normal markdown docs
#54072 closed
Jun 28, 2025 -
[Docs][KubeRay] Convert configuring-autoscaling.ipynb back to markdown docs
#54077 closed
Jun 28, 2025 -
[Docs][KubeRay] Convert rayservice-quick-start.ipynb back to markdown docs
#54076 closed
Jun 28, 2025 -
[Docs][KubeRay] Convert raycluster-quick-start.ipynb back to markdown docs
#54074 closed
Jun 28, 2025 -
CI test windows://python/ray/serve/tests:test_standalone is consistently_failing
#48420 closed
Jun 28, 2025 -
CI test linux://rllib:examples/connectors/multi_agent_with_different_observation_spaces is flaky
#53473 closed
Jun 28, 2025 -
CI test linux://python/ray/data:test_arrow_block_scaling is flaky
#54110 closed
Jun 28, 2025 -
CI test linux://rllib:examples/metrics/custom_metrics_in_algorithm_training_step is flaky
#51870 closed
Jun 28, 2025 -
[Data] When writing on BigQuery, Google's "TooManyRequests" exceptions is not retried
#53997 closed
Jun 28, 2025 -
[RayLLM] RayLLM / vLLM production stack integration
#53331 closed
Jun 27, 2025 -
CI test windows://python/ray/serve/tests:test_logging is flaky
#46043 closed
Jun 27, 2025 -
[Core] Ray fails to fulfill request due to node being annotated by IP address
#54152 closed
Jun 27, 2025 -
[Docs][KubeRay] Delete KubeRay doctests
#54073 closed
Jun 27, 2025 -
CI test linux://:local_object_manager_test is flaky
#54131 closed
Jun 27, 2025 -
[Data] `ArrowInvalid` during `ray.data.from_huggingface`: Parquet magic bytes not found in footer
#54101 closed
Jun 27, 2025 -
CI test linux://python/ray/data:test_json is flaky
#48150 closed
Jun 27, 2025 -
CI test linux://rllib:examples/algorithms/vpg_custom_algorithm is flaky
#53925 closed
Jun 27, 2025 -
CI test linux://rllib:examples/algorithms/appo_custom_algorithm_w_shared_data_actor is flaky
#53176 closed
Jun 27, 2025 -
CI test linux://rllib:examples/evaluation/evaluation_parallel_to_training_multi_agent_duration_auto is flaky
#53255 closed
Jun 27, 2025 -
CI test linux://:node_manager_test is flaky
#54059 closed
Jun 27, 2025 -
[Serve] UnboundLocalError: local variable 'stopped' in deployment state
#54169 closed
Jun 27, 2025 -
[Core] Exiting because this node manager has mistakenly been marked as dead by the GCS
#54035 closed
Jun 27, 2025 -
[bug][serve.llm] AssertionError: failed to get the hash of the compiled graph (VLM, batch, TP=2)
#53824 closed
Jun 27, 2025 -
[Serve, LLM] missing botocore dependency!
#53052 closed
Jun 27, 2025 -
Error Handling Large Pyarrow Chunk
#53536 closed
Jun 26, 2025 -
CI test linux://python/ray/train/v2:test_controller is consistently_failing
#54147 closed
Jun 26, 2025 -
[Serve][LLM] Qwen3 models “enable_thinking: False” still returns thinking process
#52979 closed
Jun 26, 2025 -
[Core] Ray fails to fulfill request due to node being annotated by IP address
#54150 closed
Jun 26, 2025 -
[Docs][KubeRay] Convert kuberay-gcs-ft.ipynb back to markdown docs
#54078 closed
Jun 26, 2025 -
[Docs][KubeRay] Convert rayjob-quick-start.ipynb back to markdown docs
#54075 closed
Jun 26, 2025 -
CI test darwin://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54126 closed
Jun 26, 2025 -
CI test windows://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54132 closed
Jun 26, 2025 -
[Core] Transient network failure on RPC `MarkJobFinished` causes node crash
#53645 closed
Jun 26, 2025 -
CI test linux://python/ray/tests:test_basic_3_client_mode is consistently_failing
#54119 closed
Jun 26, 2025 -
[Doc] The anchors of headers doesn't follow Vale rules.
#53516 closed
Jun 26, 2025 -
CI test linux://:local_object_manager_test is flaky
#54130 closed
Jun 26, 2025 -
[Core] Could not connect to socket
#54067 closed
Jun 26, 2025 -
[core] TSAN failing on `node_manager_test`
#54096 closed
Jun 25, 2025 -
CI test linux://python/ray/data:test_arrow_block is flaky
#48859 closed
Jun 25, 2025 -
CI test linux://python/ray/data:test_huggingface is consistently_failing
#44516 closed
Jun 25, 2025 -
CI test linux://python/ray/train:accelerate_torch_trainer_no_raydata is consistently_failing
#48939 closed
Jun 25, 2025 -
CI test linux://python/ray/train:deepspeed_torch_trainer is consistently_failing
#44517 closed
Jun 25, 2025 -
Release test training_ingest_benchmark-task=image_classification.full_training.jpeg failed
#53953 closed
Jun 25, 2025 -
[Core] Autoscaler Node Recovery Ignores Node-Specific Docker Config
#53987 closed
Jun 25, 2025 -
[Doc][KubeRay] Run doctest `user-guides/configuring-autoscaling.ipynb` in CI
#53989 closed
Jun 25, 2025 -
CI test windows://python/ray/tests:test_basic is consistently_failing
#51497 closed
Jun 25, 2025 -
[CI] Migrate from flake8 to ruff
#34889 closed
Jun 25, 2025 -
[Docker] Upgrade the base image from ubuntu:focal to ubuntu:22.04LTS
#35514 closed
Jun 25, 2025 -
CI test linux://python/ray/data:test_backpressure_e2e is flaky
#49963 closed
Jun 25, 2025 -
CI test linux://python/ray/tests:test_runtime_env_complicated is consistently_failing
#49674 closed
Jun 25, 2025 -
CI test linux://python/ray/data:test_execution_optimizer is consistently_failing
#44410 closed
Jun 25, 2025 -
[Dashboard] Decorator that exposes attribute to dashboard for display in grid
#33188 closed
Jun 24, 2025 -
[serve] AttributeError when attempting to use serve with cluster and FastAPI
#54008 closed
Jun 24, 2025 -
[gcp] Node mistakenly marked dead: increase heartbeat timeout?
#16945 closed
Jun 24, 2025 -
Docs on Cython extensions and install requirements
#7094 closed
Jun 24, 2025 -
[core] Detached actor being killed when its parent actor crashes
#40864 closed
Jun 24, 2025 -
CI test linux://doc:doctest[data] is consistently_failing
#54036 closed
Jun 24, 2025 -
CI test linux://python/ray/data:doctest is consistently_failing
#44570 closed
Jun 24, 2025 -
[data/proprocessors] Support flattening vector features in concatenator
#51757 closed
Jun 24, 2025 -
[Docs][KubeRay] Don't sleep for a long time in `kuberay-gcs-ft.ipynb`
#54040 closed
Jun 24, 2025 -
Release test many_nodes_actor_test_on_v2.aws failed
#53990 closed
Jun 24, 2025 -
CI test linux://doc/source/train/examples/lightning:lightning_cola_advanced is consistently_failing
#44545 closed
Jun 24, 2025 -
CI test linux://python/ray/train:accelerate_torch_trainer is consistently_failing
#44513 closed
Jun 24, 2025 -
CI test linux://python/ray/train:deepspeed_torch_trainer_no_raydata is consistently_failing
#44932 closed
Jun 24, 2025 -
CI test windows://python/ray/serve/tests:test_request_timeout is flaky
#48417 closed
Jun 24, 2025 -
[old]
#54020 closed
Jun 23, 2025 -
CI test linux://rllib:learning_tests_multi_agent_stateless_cartpole_ppo_multi_cpu is consistently_failing
#47313 closed
Jun 23, 2025 -
CI test windows://python/ray/serve/tests:test_batching is consistently_failing
#46016 closed
Jun 23, 2025 -
CI test linux://python/ray/tests:test_runtime_env_container is consistently_failing
#45223 closed
Jun 23, 2025 -
[CI] `linux://python/ray/tests:test_state_api` is failing/flaky on master.
#54001 closed
Jun 23, 2025 -
Ability to select a disk for ray workers
#8607 closed
Jun 23, 2025 -
Conflict between ROCR_VISIBLE_DEVICES and HIP_VISIBLE_DEVICES environment variables causes Ray import error
#53737 closed
Jun 21, 2025 -
CI test linux://python/ray/serve/tests:test_multiplex is flaky
#48378 closed
Jun 21, 2025 -
[RLlib] MAML does not work with TF2 in Ray 2.3.1
#34620 closed
Jun 20, 2025 -
[RayData|RayServe] Does RayData/RayServe support multi-node vllm inference
#53192 closed
Jun 20, 2025 -
[Core] Core Worker crashing
#49088 closed
Jun 20, 2025 -
[core][gpu-objects] Driver tries to get the data from in-actor store
#51272 closed
Jun 19, 2025 -
[Core][ROCm] Setting CUDA_VISIBLE_DEVICES leads to an assertion
#52701 closed
Jun 19, 2025 -
[Autoscaler][V2] Autoscaler fails to delete idle KubeRay Pod
#52264 closed
Jun 19, 2025 -
CI test linux://python/ray/data:test_consumption is flaky
#48163 closed
Jun 19, 2025 -
CI test windows://python/ray/tests:test_actor_state_metrics is consistently_failing
#46303 closed
Jun 19, 2025 -
[data] ray.data.read_images is slower than reading images manually
#37499 closed
Jun 19, 2025 -
[RFC] Q2 Ray Data Roadmap
#51808 closed
Jun 19, 2025 -
[RFC] LLM APIs for Ray Data and Ray Serve
#50639 closed
Jun 19, 2025 -
CI test windows://python/ray/serve/tests:test_standalone_3 is flaky
#44003 closed
Jun 19, 2025 -
Release test compiled_graphs failed
#53716 closed
Jun 18, 2025 -
CI test darwin://python/ray/tests:test_metrics_agent_open_telemetry is consistently_failing
#53828 closed
Jun 18, 2025 -
[RLlib] ActionMaskingTorchRLModule can't set up `conv_filters`
#53325 closed
Jun 18, 2025 -
[Core] `ray.init()` and `ray start` fails on Windows 11 in ray 2.45+
#52739 closed
Jun 18, 2025 -
CI test windows://python/ray/tests:test_object_spilling_debug_mode is flaky
#43796 closed
Jun 18, 2025 -
[core] support S3 path style access in runtime_env download_and_unpack_package()
#53893 closed
Jun 17, 2025 -
How to transfer tensors stored in GPU in actor with NCCL?
#53816 closed
Jun 17, 2025 -
[Data] PyArrow 20.0.0 Backward Incompatability (`unexpected keyword argument 'maps_as_pydicts'`)
#52685 closed
Jun 17, 2025 -
CI test linux://python/ray/tests:test_gpu_objects_nccl is consistently_failing
#53871 closed
Jun 17, 2025 -
[RLlib] Headnode without GPU triggers torch/CUDA de-serialization error
#53467 closed
Jun 17, 2025 -
[Core] Ray Autoscaler does not restart a worker node on setup failure
#29127 closed
Jun 17, 2025 -
Release test llm_batch_vllm failed
#53827 closed
Jun 17, 2025 -
[Serve] Add timeout parameter for `deploy`
#25433 closed
Jun 17, 2025 -
[Core] Read-only buffer error in some scikit-learn models
#52571 closed
Jun 17, 2025 -
[core] ray stop --force doesn't kill processes on worker node
#28038 closed
Jun 17, 2025 -
[core][gpu-objects] Support TensorDict
#51550 closed
Jun 17, 2025 -
[core][gpu-objects] Allocate placeholder tensor on corresponding devices
#53622 closed
Jun 17, 2025 -
[core][gpu-objects] Driver should order all collective calls to avoid deadlock
#51264 closed
Jun 17, 2025 -
CI test windows://python/ray/tests:test_object_spilling_asan is consistently_failing
#45962 closed
Jun 17, 2025 -
CI test windows://python/ray/tests:test_object_spilling is consistently_failing
#45961 closed
Jun 16, 2025 -
[RLlib] Add syntax checking to configuration string literals or migrate to enums.
#39384 closed
Jun 16, 2025 -
[Ray Core] Ray error causes the Python interpreter to terminate without failing
#28211 closed
Jun 16, 2025 -
[CI] Test GPU training tutorial with Ray Release tests
#28902 closed
Jun 16, 2025 -
[core][gpu-objects] intra-process communication
#51685 closed
Jun 16, 2025 -
CI test windows://python/ray/tests:test_basic_client_mode is flaky
#52117 closed
Jun 13, 2025 -
[Serve] check_health with custom exception does not enter failed state, infinite retries
#53742 closed
Jun 13, 2025 -
[core][gpu-objects] Object contains multiple tensors and/or mix of CPU data and GPU tensors
#51274 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_standalone_with_comp_sche is flaky
#48425 closed
Jun 13, 2025 -
CI test linux://python/ray/tune:test_tuner is consistently_failing
#53786 closed
Jun 13, 2025 -
Release test serve_autoscaling_load_test.aws failed
#53760 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp1_2p6d failed
#53769 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp1_1p1d failed
#53768 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot2_1B_no_accelerator failed
#53765 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot2_1B_s3 failed
#53767 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_lora failed
#53766 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_quantized_tp_1 failed
#53764 closed
Jun 13, 2025 -
Release test llm_serve_llama_3dot1_8B_tp_2 failed
#53763 closed
Jun 13, 2025 -
Release test serve_scale_replicas.aws failed
#53761 closed
Jun 13, 2025 -
[llm] vllm is throwing RuntimeError("Failed to infer device type")
#51967 closed
Jun 13, 2025 -
CI test windows://python/ray/serve/tests:test_target_capacity is consistently_failing
#48426 closed
Jun 12, 2025 -
CI test windows://python/ray/serve/tests:test_telemetry is flaky
#48427 closed
Jun 12, 2025 -
CI test linux://python/ray/data:test_datasink is consistently_failing
#52098 closed
Jun 12, 2025 -
CI test linux://rllib:examples/learners/ppo_with_torch_lr_schedulers is flaky
#49181 closed
Jun 12, 2025 -
[Serve] concurrency in ray.serve.batch
#53071 closed
Jun 12, 2025 -
Release test batch_inference_hetero failed
#53601 closed
Jun 12, 2025 -
CI test darwin://python/ray/tests:test_runtime_env_uv_run_client_mode is consistently_failing
#53650 closed
Jun 11, 2025 -
CI test windows://python/ray/serve/tests:test_backpressure is consistently_failing
#50386 closed
Jun 11, 2025 -
[Kuberay] The reference of Kuberay code link should bind to a release version
#53655 closed
Jun 11, 2025 -
[Data] add boundaries or sorted flag to GroupedData.map_groups
#52577 closed
Jun 11, 2025 -
CI test windows://python/ray/tests:test_actor_failures is consistently_failing
#52130 closed
Jun 11, 2025 -
[Dashboard/Core] In KubeRay, resource list in Cluster Dashboard tab
#53641 closed
Jun 11, 2025 -
[core][gpu-objects] Performance regression caused by transferring object references for small objects
#53623 closed
Jun 10, 2025 -
[Core] Ray worker fails to register with raylet when using grpcio>=1.71.0
#53631 closed
Jun 10, 2025 -
[core] support .rayignore
#53648 closed
Jun 10, 2025 -
[core] CUDA VISIBLE DEVICES is not being set for PlacementGroups
#53643 closed
Jun 10, 2025 -
CI test linux://python/ray/tests:test_client_builder is flaky
#43570 closed
Jun 10, 2025 -
Release test map_batches_fixed_size_actors_numpy_False failed
#53658 closed
Jun 10, 2025 -
Release test map_batches_autoscaling_actors_numpy_False failed
#53659 closed
Jun 10, 2025 -
CI test windows://python/ray/tests:test_multi_tenancy is consistently_failing
#51506 closed
Jun 10, 2025 -
[Core] Ray doesn't respect object_store_memory when spilling is disabled
#53086 closed
Jun 10, 2025 -
[Serve] quick request cancellation with model composition leads to unhandled `TaskCancelledError`s
#53639 closed
Jun 10, 2025 -
Migrating from `manylinux1` to `manylinux2014`
#53548 closed
Jun 10, 2025 -
CI test linux://python/ray/data:test_map is consistently_failing
#48164 closed
Jun 9, 2025 -
CI test windows://python/ray/tests:test_output is consistently_failing
#51467 closed
Jun 9, 2025 -
CI test darwin://python/ray/tests:test_runtime_env_conda_and_pip_client_mode is consistently_failing
#53649 closed
Jun 9, 2025 -
[Core] We should make preloading Jemalloc configurable for worker
#47242 closed
Jun 9, 2025 -
CI test darwin://python/ray/tests:test_job is consistently_failing
#45537 closed
Jun 9, 2025 -
Issue with TLS Authentication
#53651 closed
Jun 9, 2025 -
Test issue (please ignore) - more text then even more text
#49867 closed
Jun 7, 2025 -
[Serve| Observability] Show the duration of each request
#36633 closed
Jun 6, 2025 -
CI test windows://python/ray/tests:test_implicit_resource is consistently_failing
#43849 closed
Jun 6, 2025 -
CI test windows://python/ray/tests:test_reference_counting_2 is flaky
#45964 closed
Jun 6, 2025 -
CI test linux://src/ray/telemetry/tests:open_telemetry_metric_recorder_test is flaky
#53538 closed
Jun 6, 2025 -
Nsight GPU profiling unable to run on systems that don't have a "python" on the PATH
#53597 closed
Jun 6, 2025 -
[data] Cannot convert dict to PyArrow blocks
#42075 closed
Jun 5, 2025 -
[data][bug] Dataset.context not being sealed after creation
#41573 closed
Jun 5, 2025 -
[data] Allow overlapping execution of multiple Datasets
#41968 closed
Jun 5, 2025 -
CI test windows://python/ray/tests:test_task_metrics is consistently_failing
#43770 closed
Jun 5, 2025 -
[Train] Get train_func return value
#49707 closed
Jun 5, 2025 -
[Core] `setproctitle` vendored dependency fails to build with GCC 15
#52944 closed
Jun 4, 2025 -
[core] Ray PlacementGroup not respecting all resources
#53525 closed
Jun 4, 2025 -
[core] aggregated metrics for `ray_tasks`/`ray_actors`
#47289 closed
Jun 4, 2025 -
Fatal Python error: Segmentation fault
#49998 closed
Jun 4, 2025 -
[Core/Train?] Minimal Ray Train run crashes with `SIGABRT` / `uv_accept: invalid argument`
#49252 closed
Jun 4, 2025 -
No backend type associated with device type npu
#50516 closed
Jun 4, 2025 -
[Train] Crash at end of training
#51527 closed
Jun 4, 2025 -
CI test linux://python/ray/tests:test_advanced_9 is flaky
#53513 closed
Jun 4, 2025 -
CI test linux://python/ray/tests:test_runtime_env is flaky
#53509 closed
Jun 4, 2025 -
[core] Ray 2.45+ broken on Windows.
#53466 closed
Jun 3, 2025 -
[Core] Use a FIPS-compliant version of BoringSSL for internal gRPC comms
#53408 closed
Jun 3, 2025 -
'Worker' object has no attribute 'core_worker'
#48682 closed
Jun 3, 2025 -
[Ray Data] Beam Search Support for vLLM Batch Inference
#53396 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_label_utils is consistently_failing
#53506 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_basic_5 is consistently_failing
#53505 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_basic_3 is consistently_failing
#53504 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_basic_4 is consistently_failing
#53503 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_basic_2 is consistently_failing
#53502 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_utils is consistently_failing
#53501 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_runtime_env_ray_minimal is consistently_failing
#53500 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_minimal_install is consistently_failing
#53496 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_bundle_label_selector is consistently_failing
#53497 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_path_utils is consistently_failing
#53498 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_label_scheduling is consistently_failing
#53499 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_basic is consistently_failing
#51974 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_usage_stats is consistently_failing
#49672 closed
Jun 3, 2025 -
CI test linux://python/ray/tests:test_output is consistently_failing
#48551 closed
Jun 3, 2025 -
CI test linux://python/ray/dashboard:test_dashboard is consistently_failing
#44917 closed
Jun 3, 2025 -
[Core] Ray CUDA Images on 2.45+ are missing required NVIDIA driver
#53266 closed
Jun 2, 2025 -
CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_cpu is flaky
#47465 closed
Jun 2, 2025 -
[RLlib] Getting and setting state of APPO Algorithm object raises error
#53468 closed
Jun 1, 2025 -
[core] Correlated performance regression across multiple microbenchmarks
#53333 closed
Jun 1, 2025 -
CI test darwin://python/ray/tests:test_tempfile is consistently_failing
#53431 closed
Jun 1, 2025 -
[Data]: Categorizer fails with non uniform distributions
#50792 closed
May 30, 2025 -
CI test windows://python/ray/tests:test_ray_init is flaky
#52808 closed
May 29, 2025 -
CI test windows://python/ray/tests:test_worker_graceful_shutdown is consistently_failing
#53381 closed
May 29, 2025
124 Issues opened by 94 people
-
[data] Custom local shuffling batcher for Dataset.iter_*
#54197 opened
Jun 28, 2025 -
CI test darwin://python/ray/dashboard:modules/aggregator/tests/test_aggregator_agent is consistently_failing
#54191 opened
Jun 28, 2025 -
[RLlib] Restored *custom* metrics after check-pointing is broken since 2.47
#54174 opened
Jun 27, 2025 -
Ray&slurm&NPU
#54170 opened
Jun 27, 2025 -
[core][gpu-objects] Hide the details of constructing process groups
#54168 opened
Jun 27, 2025 -
[core][gpu-objects] Support streaming generator
#54167 opened
Jun 27, 2025 -
[core][gpu-objects] Support DTensor
#54166 opened
Jun 27, 2025 -
ERROR services.py:1355 -- Failed to start the dashboard , return code 3221226505
#54165 opened
Jun 27, 2025 -
CI test linux://python/ray/data:test_block_sizing is consistently_failing
#54164 opened
Jun 27, 2025 -
Assessment of the difficulty in porting CPU architecture for Ray
#54162 opened
Jun 27, 2025 -
CI test linux://python/ray/tests:test_scheduling_client_mode is flaky
#54160 opened
Jun 27, 2025 -
[core] ray.util.state.api.get_actor with timeout = 1s does not work
#54153 opened
Jun 26, 2025 -
Ray component: Core ray.init() fails on windows since #51731
#54151 opened
Jun 26, 2025 -
[core] Improving Ray Typing annotation
#54149 opened
Jun 26, 2025 -
[Core] bug in _check_uv_existence() method of uv runtime backend breaks installing packages in ray runtimes
#54134 opened
Jun 26, 2025 -
Release test air_example_gptj_deepspeed_fine_tuning failed
#54133 opened
Jun 26, 2025 -
[Core] ray job submit may hang in some scenarios
#54120 opened
Jun 26, 2025 -
[Docker] [CI] Bump the GPU base image to a newer version
#54102 opened
Jun 25, 2025 -
[Core] ray._raylet.CoreWorker.put_file_like_object, parameter owner_address unused
#54100 opened
Jun 25, 2025 -
[Data] Allow parameterized queries in `read_sql`
#54098 opened
Jun 25, 2025 -
[RLlib] num_env_steps_sampled_lifetime is wrong after checkpoint loaded - bug changed in 2.47
#54089 opened
Jun 25, 2025 -
[Core] When pinning object, transient error on RPC `PubsubLongPolling` causes job stuck
#54081 opened
Jun 25, 2025 -
[serve.llm] vLLM engine became unhealthy under high incoming traffic
#54070 opened
Jun 25, 2025 -
[data] support streaming writes for `write_lance`
#54069 opened
Jun 25, 2025 -
[train] Can not start training on more than one node
#54065 opened
Jun 25, 2025 -
[train] Add Azure Files support to persistent storage documentation
#54054 opened
Jun 24, 2025 -
[Core] ray cannot start under macos + anaconda + python 3.13 + bash
#54047 opened
Jun 24, 2025 -
[Core] Ray postmortem debugging does not work with python 3.12
#54044 opened
Jun 24, 2025 -
[RFC] Improving Ray for Post-Training / RL for LLM Projects
#54021 opened
Jun 23, 2025 -
[Core] Multi-threaded ray.get can hang in certain situations.
#54007 opened
Jun 23, 2025 -
[CI] `linux://python/ray/tests:test_scheduling_debug_mode` is failing/flaky on master.
#54002 opened
Jun 23, 2025 -
Ray worker resolves module to __init__.py instead of actual file for nested package class
#53998 opened
Jun 22, 2025 -
[data] Slow fetching of metadata for large number of parquet files
#53995 opened
Jun 22, 2025 -
[Rllib] Bug in TorchMultiDistribution logp prevents policy mapping from being used
#53994 opened
Jun 22, 2025 -
[core][gpu-objects] Allow sending ObjectRefs to other processes
#53978 opened
Jun 20, 2025 -
[core][gpu-objects] Support ray.put
#53977 opened
Jun 20, 2025 -
[core][gpu-obj 341A ects] RDMA support for data transfer
#53976 opened
Jun 20, 2025 -
[Dashboard] Support for List Tasks Filter Pushdown
#53970 opened
Jun 20, 2025 -
[Data] Add support to turn off strict block-size enforcement
#53954 opened
Jun 19, 2025 -
[Core] `InternalKVPut` retries incorrectly when encountering transient error
#53946 opened
Jun 19, 2025 -
PolicyServer and PolicyClient Demo Issue
#53926 opened
Jun 18, 2025 -
Windows VS WSL2
#53924 opened
Jun 18, 2025 -
[Docker][CI] Add Python 3.13 Ray Image to CI
#53923 opened
Jun 18, 2025 -
[serve.llm] Ray LLM serving not respecting max_completion_tokens parameter
#53922 opened
Jun 18, 2025 -
[Ray V2 Tune + Train] Tuner is not aware of resources and oversubscribes leading to deadlocks
#53921 opened
Jun 18, 2025 -
[Data/Preprocessors]: Preprocessors do not work with nested records
#53920 opened
Jun 18, 2025 -
[Core] Ray Does Not Detect GPU
#53919 opened
Jun 18, 2025 -
Multiple CVEs in Ray's compiled dependencies
#53915 opened
Jun 18, 2025 -
Using ray for LLM inference got errors
#53907 opened
Jun 18, 2025 -
[CI] `linux://python/ray/data:test_consumption` is failing/flaky on master.
#53897 opened
Jun 17, 2025 -
[Ray Data]Pylint detection found some Python code defects in ray data
#53881 opened
Jun 17, 2025 -
[dashboard] Support to overwrite the _client_max_size of http request entity
#53879 opened
Jun 17, 2025 -
[RLlib] Significant drop in DQN training reward when resuming from checkpoint
#53878 opened
Jun 17, 2025 -
[RLlib] Checkpoint metrics loading with Tune is broken in 2.47.0
#53877 opened
Jun 17, 2025 -
Issue: Ray Dashboard Links to Grafana Return "Dashboard Not Found" (Windows)
#53876 opened
Jun 17, 2025 -
[serve.llm] LLM serving seems not working with mistral tokenizer.
#53873 opened
Jun 17, 2025 -
[Core] ray.ActorID.nil().job_id
#53872 opened
Jun 17, 2025 -
[Core] Ray 2.47 regression: All tasks hang when using `uv`
#53848 opened
Jun 16, 2025 -
[RLlib] Typo in error message on line 37 of ray/rllib/env/utils/__init__.py
#53841 opened
Jun 16, 2025 -
[rllib] [bug] Official PPO Atari example fails with IndexError
#53836 opened
Jun 15, 2025 -
[Tune|RLlib] PBT reward drop - not checkpointing or restoring properly
#53831 opened
Jun 14, 2025 -
[Dashboard] Discrepancy between Worker Process Memory Display on Dashboard and RSS Statistics
#53829 opened
Jun 14, 2025 -
[flaky] test_scheduling_2.py::test_demand_report_when_scale_up
#53811 opened
Jun 13, 2025 -
Release test random_shuffle_fixed_size failed
#53806 opened
Jun 13, 2025 -
[Data] Custom Partitioner in Ray Data and Related Implementation Considerations
#53800 opened
Jun 13, 2025 -
[Core] Transient network failure on RPC `WaitForActorRefDeleted` causes actor registration fail
#53797 opened
Jun 13, 2025 -
How to enable tool calling in serve llm?
#53795 opened
Jun 13, 2025 -
[RLlib] Checkpointing fails with CUDA GPU learner using the new API stack
#53793 opened
Jun 13, 2025 -
[<Ray component: Core|RLlib|etc...>] Issue of port allocation
#53790 opened
Jun 13, 2025 -
[RLlib][Unity] unity3d_env_local.py 'NoneType' for action spaces
#53780 opened
Jun 12, 2025 -
Support gymnasium > 1.0.0
#53776 opened
Jun 12, 2025 -
[Dashboard] Support ncu
#53759 opened
Jun 12, 2025 -
[Core] Ray hangs with vllm0.8.5 v1 api for tp8+pp4
#53758 opened
Jun 12, 2025 -
Core: Ray 2.45 causes Google's LIBTPU to be very spammy
#53756 opened
Jun 12, 2025 -
[core] Race condition between raylet graceful shutdown and GCS health checks
#53739 opened
Jun 11, 2025 -
[Announcement] Ray Summit 2025 Call for Proposals Due June 30th
#53729 opened
Jun 11, 2025 -
[core] Actor restarts don't work when an actor creation arg is evicted from plasma
#53727 opened
Jun 10, 2025 -
[Core] Custom docker image not scaling out
#53696 opened
Jun 10, 2025 -
[<Ray component: Core|RLlib|etc...>] SAC config error about framework
#53694 opened
Jun 10, 2025 -
[Data] Support for SQL/DataFrame capability
#53693 opened
Jun 10, 2025 -
[Dashboard] Display gpu metrics for AMD/ROCm devices
#53689 opened
Jun 10, 2025 -
[Serve] Serve-native CPU profiling in Replicas is broken
#53677 opened
Jun 9, 2025 -
[Dask-on-Ray,core] Tasks not registering on the jobs and job is subsequently getting stuck
#53666 opened
Jun 9, 2025 -
[Ray Core] Detached actor doesn't finish method after the client disconnects
#53665 opened
Jun 9, 2025 -
[Ray serve] Unable to serve meta-llama/Llama-3.1-8B-Instruct
#53663 opened
Jun 9, 2025 -
[core][autoscaler] Select different node types when a node type is unavailable
#53636 opened
Jun 7, 2025 -
[core][dashboard]: Package already exists, skipping upload.
#53635 opened
Jun 7, 2025 -
[core][compiled graphs] Slow NCCL init on H200 server
#53619 opened
Jun 6, 2025 -
[Ray Core/Dashboard] - Installing Ray via UV breaks dashboard.
#53608 opened
Jun 6, 2025 -
[<Ray component: Data>] lack of check for empty table produce lots of error messages
#53605 opened
Jun 6, 2025 -
Allow CPU Only Run
#53603 opened
Jun 6, 2025 -
[Core] ray._raylet.ObjectRef and ray.types.ObjectRef type compabtibility
#53591 opened
Jun 5, 2025 -
[Serve] Autoscaling not working correctly when `max_replica_per_node` is set in Ray Serve
#53582 opened
Jun 5, 2025 -
[Ray Train Feature Request] Native Cross-Validation Support in Ray Train API
#53581 opened
Jun 5, 2025 -
[Serve] Unable to load meta-llama/Llama-3.3-70B-Instruct
#53571 opened
Jun 4, 2025 -
[core] TPU Visible Chips not set correctly
#53569 opened
Jun 4, 2025 -
[Data] Add support for streaming version of `repartition(key=...)`
#53560 opened
Jun 4, 2025 -
[RayData] The write operator supports the use of an actor pool
#53552 opened
Jun 4, 2025 -
Release test sort_autoscaling failed
#53546 opened
Jun 4, 2025 -
[Serve][llm] Make Serve LLM endpoint 100% compatible with the engine's native server.
#53533 opened
Jun 3, 2025 -
pyarrow.lib.ArrowInvalid: Struct child array #5 does not match type field: null vs double
#53529 opened
Jun 3, 2025 -
[RLlib] Is IMPALA applicable to continuous action spaces?
#53521 opened
Jun 3, 2025 -
[core] ray.init() not possible even while on same network as Ray Cluster.
#53520 opened
Jun 3, 2025 -
[CI][Tune] Enable isort for `python/ray/tune/__init__.py` and fix circular imports
#53518 opened
Jun 3, 2025 -
[Core] ASSERTION FAILED: queue.num_items() == 0
#53510 opened
Jun 3, 2025 -
[Ray Data] Filtering function is very slow
#53493 opened
Jun 2, 2025 -
[Serve.llm] Clean up output logs and give option to opt out of different verbosity levels
#53492 opened
Jun 2, 2025 -
[core|serve] Migrate shared utilities from `ray._private` to `ray._common`
#53478 opened
Jun 2, 2025 -
[Ray Serve] GCS Segmentation Fault on failed Redis requests
#53475 opened
Jun 2, 2025 -
[Data] dataset.write_iceberg to support upserts
#53438 opened
May 30, 2025 -
Replace SHA-1 usage with safer alternatives (e.g., SHA-256)
#53435 opened
May 30, 2025 -
[RLLIB] EnvContext.vector_index is always 0
#53419 opened
May 29, 2025 -
AttributeError: 'NoneType' object has no attribute 'enable_rl_module_and_learner' with highway-env
#53398 opened
May 29, 2025
1,593 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[core][compiled graphs] Supporting allreduce on list of input nodes
#51047 commented on
Jun 25, 2025 • 32 new comments -
[Core] Add default Ray Node labels at Node init
#53360 commented on
Jun 25, 2025 • 28 new comments -
Add progress bars to hash operators
#53175 commented on
Jun 28, 2025 • 25 new comments -
[core] add ray.util.concurrent.futures.RayExecutor
#51933 commented on
Jun 27, 2025 • 12 new comments -
[RLlib] ConnectorV2 API polishings (stricter input-/output batch formats).
#53328 commented on
Jun 18, 2025 • 11 new comments -
[Docs][KubeRay] Add guide for writing KubeRay doctests
#51708 commented on
Jun 25, 2025 • 11 new comments -
[Refactor]Rename NCCL-related items to comm_backend
#51061 commented on
Jun 24, 2025 • 8 new comments -
update to protbuf-28.2, absl-20240722, grpc-1.67 and patch for windows
#51673 commented on
Jun 23, 2025 • 6 new comments -
feat(runtime_env): add Azure Blob Storage support
#53135 commented on
Jun 27, 2025 • 4 new comments -
Add generic item support for queue
#46849 commented on
Jun 18, 2025 • 4 new comments -
[Dashboard] Add GPU component usage
#52102 commented on
Jun 24, 2025 • 4 new comments -
Adapt Dask on Ray to the new Dask Task class
#52589 commented on
Jun 28, 2025 • 4 new comments -
[RLlib] Add NPU and HPU support to RLlib
#49535 commented on
Jun 17, 2025 • 3 new comments -
Add Apple silicon GPU(mps) support to ray
#38464 commented on
Jun 26, 2025 • 2 new comments -
Relax check_version_info to check for bytecode compatibility
#41373 commented on
Jun 19, 2025 • 2 new comments -
[core] Support `.options` chaining in `actor.options`
#51836 commented on
Jun 25, 2025 • 2 new comments -
[Core] Ensure Ray vendored libraries only be visible and used by Ray internal
#52905 commented on
Jun 17, 2025 • 1 new comment -
[data] add better support for list-typed fields when using `write_bigquery`
#44564 commented on
Jun 28, 2025 • 1 new comment -
[core][collective] Avoid creation of `gloo_queue` in race condition
#50132 commented on
Jun 22, 2025 • 1 new comment -
[Docs] Clarify Train-side docs on Ray Data
#53349 commented on
Jun 28, 2025 • 1 new comment -
[RLlib] Enable Training from Replay Buffer Larger than Memory
#23816 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] [Bug] Inconsistent behavior between TFPolicy and TorchPolicy on `compute_actions_from_input_dict`
#24007 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] [Bug] IMPALA causes an OOM after a long running.
#23769 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Categorical action dist incorrectly uses tf.random.categorical
#24055 commented on
Jun 16, 2025 • 0 new comments -
[BUG] Ray dashboard client failed to build
#23548 commented on
Jun 16, 2025 • 0 new comments -
[RFC][Feature][Autoscaler][Core]Graceful draining of nodes while scale-down
#23522 commented on
Jun 16, 2025 • 0 new comments -
[ml][Improvement] Improve messages to be “rank0, rank1” actors etc.
#23310 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [tune] create a mlflow run name from config params
#23228 commented on
Jun 16, 2025 • 0 new comments -
[Feature][RLlib] Improve pytorch memory usage by disabling caching
#23077 commented on
Jun 16, 2025 • 0 new comments -
[tune][Bug] Worker doesn't sync the logs to HDFS at the given interval
#23055 commented on
Jun 16, 2025 • 0 new comments -
[Bug] AdaBelief optimizer crashes checkpoint restore
#22976 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Custom model with R2D2
#22747 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Resources displayed in Dashboard don't match cluster configuration
#22548 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Deletion of Ray clusters hangs while Ray operator is still up
#22505 commented on
Jun 16, 2025 • 0 new comments -
Doing import ray breaks my logging [Bug]
#22312 commented on
Jun 16, 2025 • 0 new comments -
[Feature][Client] remove ray.disconnect() and ray.connect()
#22125 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Detached actor exceptions are not logged.
#21810 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Sometimes the worker node logs in the ray dashboard are empty
#21785 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Status updates still prints even with breakpoint
#28554 commented on
Jun 17, 2025 • 0 new comments -
[Core] RBAC + auditability
#25845 commented on
Jun 16, 2025 • 0 new comments -
[Core] Arrow Flight Server doesn't work with Ray Actors due to two GRPC versions
#25774 commented on
Jun 16, 2025 • 0 new comments -
[Core | State Observability ] Refactor summary/log SDK to use StateApiClient
#25746 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Deployment fails if name contains slashes
#25714 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] ModelCatagolg Selects Wrong Model for Nested Complex Observations
#25619 commented on
Jun 16, 2025 • 0 new comments -
[Train] [Tune] When using Train with Tune, a `logdir` is created that's not the one specified by the user
#25474 commented on
Jun 16, 2025 • 0 new comments -
[Core][Observability] Ray memory should show more objects
#25463 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] Error during render node with gpu and 4 hdds
#25437 commented on
Jun 16, 2025 • 0 new comments -
[Ray Collective Lib] Enable CI
#25396 commented on
Jun 16, 2025 • 0 new comments -
Core: deamonset feature request
#25334 commented on
Jun 16, 2025 • 0 new comments -
[DeviceMesh][Collective] Support multiple tensors API
#25129 commented on
Jun 16, 2025 • 0 new comments -
[Ray Air] nan in the tensorflow_linear_dataset_example.py
#25037 commented on
Jun 16, 2025 • 0 new comments -
ray docker images do not have uvloop installed
#25023 commented on
Jun 16, 2025 • 0 new comments -
Ray Tune: No console output is logged to Wandb.
#25011 commented on
Jun 16, 2025 • 0 new comments -
[Core][RLlib][Tune] CUDA PTX error when training with Tune
#25001 commented on
Jun 16, 2025 • 0 new comments -
Ray component: Core: PoolActor processes hanging
#24784 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Duplicate custom metrics
#24731 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Asynchronous inference best practices
#24627 commented on
Jun 16, 2025 • 0 new comments -
[tune] `progress_reporter.py` is messy and should be cleaned up
#24604 commented on
Jun 16, 2025 • 0 new comments -
[aws][autoscaler] AWS: When using spot instances, always single availability zone is selected
#24310 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] PPO - ray.rllib.agents.ppo "Put Error"
#24307 commented on
Jun 16, 2025 • 0 new comments -
[Ray Collective] Remove Redis store and LocalFile store from gloo mode.
#24288 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] upscaling_speed: 0 gets reset to 1
#24177 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] clicking on a column to sort makes the UI blank
#13525 commented on
Jun 16, 2025 • 0 new comments -
Autoscaler does not respect --num-cpus argument to `ray start`
#13270 commented on
Jun 16, 2025 • 0 new comments -
[core] Number of CPUs in ray.available_resources() does not match Dashboard's Machine View
#13100 commented on
Jun 16, 2025 • 0 new comments -
atexit handlers don't run when actor is terminated from going out of scope
#12806 commented on
Jun 16, 2025 • 0 new comments -
Task Cancellation is broken for queued tasks
#12080 commented on
Jun 16, 2025 • 0 new comments -
[logging] Use 'warnings.warn' appropriately
#12060 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] New dashboard port errors in a large cluster.
#11638 commented on
Jun 16, 2025 • 0 new comments -
ES Trainer does not support evaluation workers
#10999 commented on
Jun 16, 2025 • 0 new comments -
[Plasma] Improve plasma documentation on distributed storage
#10858 commented on
Jun 16, 2025 • 0 new comments -
Unable to connect to ray head running on linux from ray worker node on windows
#10362 commented on
Jun 16, 2025 • 0 new comments -
Ray log tracing
#9786 commented on
Jun 16, 2025 • 0 new comments -
[Core] Logging policy should be clearly defined and needs unit test coverage
#9692 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] Error on Infinity values
#9103 commented on
Jun 16, 2025 • 0 new comments -
"ray timeline" command fails when RAY_ADDRESS is set
#8951 commented on
Jun 16, 2025 • 0 new comments -
[tune] [dashboard] Table formatting issues due to too many hparams
#8667 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] "Cannot perform an interactive login from a non TTY device" when trying to use a private docker registry
#7339 commented on
Jun 16, 2025 • 0 new comments -
Allowing multiple users to access a single ray cluster
#6800 commented on
Jun 16, 2025 • 0 new comments -
[Ray core & ray cluster] Add diagrams/architectures to explain how to run ray locally vs remotely
#25663 commented on
Jun 16, 2025 • 0 new comments -
[Ray Clusters] Remove nightly and latest images and wheels from all example configs.
#25606 commented on
Jun 16, 2025 • 0 new comments -
[air] Consider having a preprocessor for Feast integration
#25559 commented on
Jun 16, 2025 • 0 new comments -
[Core] Open telemetry Context pass from ray client to actors
#25538 commented on
Jun 16, 2025 • 0 new comments -
[dataset] Reduce tasks in push-based shuffle are not evenly distributed
#25468 commented on
Jun 16, 2025 • 0 new comments -
[Core] [State Observability] List all actor logs when actors are restarted.
#25443 commented on
Jun 16, 2025 • 0 new comments -
[air] Ordinal Encoder complains about None
#25442 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] google-cloud-storage seems cannot read GOOGLE_APPLICATION_CREDENTIALS
#25308 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Dynamically move models between CPUs and GPUs
#25295 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][Doc] Add documentation for `ModelCatalog.get_model_v2()`
#25186 commented on
Jun 16, 2025 • 0 new comments -
[Core][Feature] Add checksum support for object store.
#21782 commented on
Jun 16, 2025 • 0 new comments -
Setting VF_SHARE_LAYERS to False and NO_FINAL_LINEAR to true leads to a bug
#21756 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [runtime env] support using different python versions in Ray cluster
#21597 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Serve] Request Redistribution Among Replicas
#21578 commented on
Jun 16, 2025 • 0 new comments -
Put failed error occurred when shutdown and init again at client mode
#21573 commented on
Jun 16, 2025 • 0 new comments -
[Core] [Bug] No timeout or deadlock on scheduling job in remote cluster
#21419 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Autoscaler] Scaling Intelligently Based on Observed Resource Bottlenecks (related: task & actor profiling)
#21301 commented on
Jun 16, 2025 • 0 new comments -
[Core] [Bug] Failed to register worker to Raylet for single node, multi-GPU
#21226 commented on
Jun 16, 2025 • 0 new comments -
[Train] Port over `timm` example to Train
#21020 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [RLlib] Custom metrics are not reported to Tune
#20938 commented on
Jun 16, 2025 • 0 new comments -
[Train] Deepspeed support
#20648 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Cannot start cluster if other user is already running one
#20634 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Excess memory usage when scheduling tasks in parallel?
#20618 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Ray auto init interacts badly with allow_multiple=True and kills python shell
#20355 commented on
Jun 16, 2025 • 0 new comments -
[Bug] BasicVariantGenerator not compatible with Repeater
#19879 commented on
Jun 16, 2025 • 0 new comments -
[Feature] Support Sigopt for Tune standard space definitions
#19018 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Re-enable Worker in Container Tests.
#18787 commented on
Jun 16, 2025 • 0 new comments -
[GCP][autoscaler] Rework ray TPU demos to create nothing but TPU VMs (no harddrives / `n2-standard-2` instances)
#18645 commented on
Jun 16, 2025 • 0 new comments -
[datasets] `random_shuffle` overspills objects on random node
#17612 commented on
Jun 16, 2025 • 0 new comments -
[Core] Ray Actor abnormal exit problem && Reproduction
#17198 commented on
Jun 16, 2025 • 0 new comments -
Support resizing placement groups
#16403 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] Errors are not shown
#15238 commented on
Jun 16, 2025 • 0 new comments -
changing the docker image in consecutive `ray up` calls fails.
#14990 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Add regression tests for Prometheus metrics
#14614 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] Show more nodes at a time instead of paging through
#14537 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Support Memory Aware Scheduling on a multi-node-type cluster.
#14104 commented on
Jun 16, 2025 • 0 new comments -
[RFC] k8s-native worker pool
#14077 commented on
Jun 16, 2025 • 0 new comments -
[AIR/Tune] Session report does not show the key for those not included in the first metrics report
#28549 commented on
Jun 17, 2025 • 0 new comments -
[Core] dump the info and anaylze the data offline
#28496 commented on
Jun 17, 2025 • 0 new comments -
[Core] Document what are the generic python code that's easily scalable.
#28487 commented on
Jun 17, 2025 • 0 new comments -
[AIR] Refactor checkpoint encoding and decoding out of Backend to framework-specific Checkpoints
#28462 commented on
Jun 17, 2025 • 0 new comments -
[Core] [RLlib] RLlib on Ray 2.0 not easily working on Colab
#28457 commented on
Jun 17, 2025 • 0 new comments -
[Job Submission] Support env file input in ray.runtime_env.RuntimeEnv
#28453 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Adding DEHB
#28427 commented on
Jun 17, 2025 • 0 new comments -
[serve] Gradio integration does surface error messages, runs indefinitely
#28399 commented on
Jun 17, 2025 • 0 new comments -
[Ray: Core] Ray can hang when getting an ObjectRef from an unknown environment
#28341 commented on
Jun 17, 2025 • 0 new comments -
[Core][RuntimeEnv]Make `job_submission_id` to a new index of GCS::JobTableData
#28337 commented on
Jun 17, 2025 • 0 new comments -
[Jobs] Run jobs tests on Windows
#28316 commented on
Jun 17, 2025 • 0 new comments -
[Core] Job stop should terminate runtime_env setup
#28221 commented on
Jun 17, 2025 • 0 new comments -
[Core] log_to_driver=False does not suppress worker errors in ipython
#28216 commented on
Jun 17, 2025 • 0 new comments -
[core][runtime envs] Ray should respect CUDA_VISIBLE_DEVICES if set in runtime env
#28215 commented on
Jun 17, 2025 • 0 new comments -
[Core] ray dashboard <rayhost>:8265/nodes?view=details cpuPercent should contains actor's subprocess
#28100 commented on
Jun 17, 2025 • 0 new comments -
[Core] Ray may hang if workers fail to start due to limited ports
#28071 commented on
Jun 17, 2025 • 0 new comments -
[<Ray component: Core|Cluster>] Documentation instructions for mounting AWS EFS Fails for Ray Cluster
#28057 commented on
Jun 17, 2025 • 0 new comments -
[Core] Support retry_delay option in Ray tasks
#28015 commented on
Jun 17, 2025 • 0 new comments -
[Core] Ray object primary copy transfer
#27985 commented on
Jun 17, 2025 • 0 new comments -
[Observability] ray timeline errors with ray.rpc.GetAllProfileInfoReply exceeded maximum protobuf size of 2GB
#27952 commented on
Jun 17, 2025 • 0 new comments -
[Core] allow customized error message for WorkerCrashedError
#27947 commented on
Jun 17, 2025 • 0 new comments -
tensorflow.python.framework.errors_impl.NotFoundError: ./multi_worker_model/variables/variables_temp/part-00000-of-00001.index; No such file or directory [Op:MergeV2Checkpoints]
#27938 commented on
Jun 17, 2025 • 0 new comments -
[observability] JSON file generated by Ray timeline doesn't render correctly in the new version of chrome tracing (perfetto)
#27921 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] make policy evaluation support Attention nets
#27909 commented on
Jun 17, 2025 • 0 new comments -
[WIP] Remove global worker
#53372 commented on
Jun 18, 2025 • 0 new comments -
[Autoscaler] Delete AWS resources created when launching Ray cluster upon cluster termination
#29499 commented on
Jun 17, 2025 • 0 new comments -
[Core] util.multiprocessing.pool: imap and imap_unordered blocking on ray.wait even though processes are complete
#29466 commented on
Jun 17, 2025 • 0 new comments -
[Ray Log_monitor]: close_all_files ProcessLookupError
#29452 commented on
Jun 17, 2025 • 0 new comments -
Ray core: incorrect account of GPUs on ec2 ubuntu instance: g4dn.2xlarge
#29420 commented on
Jun 17, 2025 • 0 new comments -
[core] GCS segfaults under OOM
#29336 commented on
Jun 17, 2025 • 0 new comments -
[AIR] Add progress bar for training
#29314 commented on
Jun 17, 2025 • 0 new comments -
[CI] A simple way to reproduce osx/linux/windows CI run failure locally
#29068 commented on
Jun 17, 2025 • 0 new comments -
[Core] Is it possible to do asynchroneous task submission?
#29039 commented on
Jun 17, 2025 • 0 new comments -
[doc][core] multiprocessing.Pool should document resource usage with ray_remote_args
#29004 commented on
Jun 17, 2025 • 0 new comments -
[Train] Allow passing in placement group to trainer
#28924 commented on
Jun 17, 2025 • 0 new comments -
[<Algorithm overview>]
#28915 commented on
Jun 17, 2025 • 0 new comments -
Runtime Environment Dependencies- container per task
#28875 commented on
Jun 17, 2025 • 0 new comments -
[Ray component: Core] Returning to much data from ray remote fails with no error
#28855 commented on
Jun 17, 2025 • 0 new comments -
Issue on page /ray-core/examples/plot_parameter_server.html
#28854 commented on
Jun 17, 2025 • 0 new comments -
[Datasets] Why does pydantic make training slower?
#28836 commented on
Jun 17, 2025 • 0 new comments -
[Infra] Improve Ray client usability
#28790 commented on
Jun 17, 2025 • 0 new comments -
[Core] Download Logs from Ray Dashboard
#28788 commented on
Jun 17, 2025 • 0 new comments -
Ray Core: AttributeError: 'NoneType' object has no attribute 'enum_types_by_name'
#28779 commented on
Jun 17, 2025 • 0 new comments -
[Tune] HyperOptSearch fails with nested config dicts and points_to_evaluate
#28753 commented on
Jun 17, 2025 • 0 new comments -
Ray Deployment crashes in docker [<Ray component: Serve>]
#28732 commented on
Jun 17, 2025 • 0 new comments -
[Ray Serve]: Testing out on local using Docker container
#28692 commented on
Jun 17, 2025 • 0 new comments -
[core] Generator task that returns more values than specified by num_returns should throw error instead
#28689 commented on
Jun 17, 2025 • 0 new comments -
[Core] CloudPickle explain tool
#28585 commented on
Jun 17, 2025 • 0 new comments -
[dashboard] Dashboard randomly not showing the status of worker nodes.
#28569 commented on
Jun 17, 2025 • 0 new comments -
[Job] Job submission not following convention for quote
#26514 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Unable to call ray.remote functions inside env/action dist
#26468 commented on
Jun 16, 2025 • 0 new comments -
[Core] Observing Multiple Exceptions When Using Different Python Patch Versions
#26443 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Use observations (input_dict) for exploration
#26437 commented on
Jun 16, 2025 • 0 new comments -
[core][c++ worker] RayClusterModeTest.DefaultActorLifetimeTest timed out in macOS
#26435 commented on
Jun 16, 2025 • 0 new comments -
[Ray component: Core] Enable better progress bar
#26426 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Issue Regarding Future Warnings
#26424 commented on
Jun 16, 2025 • 0 new comments -
[doc][Core | State Observability] Document usage of the rate limiting env variable in public doc
#26370 commented on
Jun 16, 2025 • 0 new comments -
[Tune] NevergradSearch Budget Exception
#26305 commented on
Jun 16, 2025 • 0 new comments -
How to color-code console output
#26226 commented on
Jun 16, 2025 • 0 new comments -
[Core] [Quality] Live handle raises unnecessary exception when script ends
#26198 commented on
Jun 16, 2025 • 0 new comments -
[RLlib]: SimpleQ TF2 is broken
#26192 commented on
Jun 16, 2025 • 0 new comments -
[State Observability] Raise an exception if the state schema contains predicates.
#26125 commented on
Jun 16, 2025 • 0 new comments -
[air] We should have a convenient method for user to interact with checkpoint file on driver when they checkpoint using other method in session
#26082 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] server reports nan episodes and empty policy
#26048 commented on
Jun 16, 2025 • 0 new comments -
[test][autoscaler] ModuleNotFoundError: No module named 'ray.tests'
#26023 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Ray Tune doesn't work inside Spark UDF
#26002 commented on
Jun 16, 2025 • 0 new comments -
[Serve] A Deployment Graph with unfulfilled demands fails to scale Pods in Kubernetes
#25998 commented on
Jun 16, 2025 • 0 new comments -
API server internal error message not useful
#25986 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Bad runtime env specified in ray.init() with eager install only raises error on task/actor invocation
#25972 commented on
Jun 16, 2025 • 0 new comments -
[Core][State Observability] Use a separate thread to run spill/restore
#25960 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] KeyError: simple_list_collector.py, line 950, in postprocess_episode
#25938 commented on
Jun 16, 2025 • 0 new comments -
[RLLib] SampleBatch.update() doesn't update `added_keys`
#25937 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Use namespace for internal KV storage
#25897 commented on
Jun 16, 2025 • 0 new comments -
[Core?] Federation + data perimeters
#25846 commented on
Jun 16, 2025 • 0 new comments -
[tune] allow using (nested) data classes for search space definition
#27904 commented on
Jun 17, 2025 • 0 new comments -
[Autoscaler][GCP] Autofill GCP node type resources
#27888 commented on
Jun 17, 2025 • 0 new comments -
[Core] The Idle worker killing feature slows down tasks
#27863 commented on
Jun 17, 2025 • 0 new comments -
[AIR] Support TorchRec trainer
#27794 commented on
Jun 17, 2025 • 0 new comments -
[Dashboard] Dashboard agent cannot be started because the port is still occupied
#27736 commented on
Jun 17, 2025 • 0 new comments -
[Tune/RLlib] log_to_file creates files, but doesn't write anything there
#27702 commented on
Jun 17, 2025 • 0 new comments -
[RLLib] global_timestep not monotonic when when running concurrent episodes with ExternalEnv
#27669 commented on
Jun 17, 2025 • 0 new comments -
[Dashboard] Ray Dashboard not showing the SpillWorker's actual memory usage
#27591 commented on
Jun 17, 2025 • 0 new comments -
[Core] The actors got distributed to just a few nodes even with spread scheduling
#27577 commented on
Jun 17, 2025 • 0 new comments -
[runtime_env] Add tests for all driver output (warnings, etc)
#27566 commented on
Jun 17, 2025 • 0 new comments -
[Tune] TuneReportCheckpointCallback causes two checkpoints to made every time it is called.
#27524 commented on
Jun 17, 2025 • 0 new comments -
[AIR] SettingWithCopyWarning for "A value is trying to be set on a copy of a slice from a DataFrame"
#27352 commented on
Jun 17, 2025 • 0 new comments -
[ray dashboard] profile button not working
#27211 commented on
Jun 17, 2025 • 0 new comments -
[Ray Train] Ray Train running slow when multiple workers executed
#27107 commented on
Jun 17, 2025 • 0 new comments -
[workflow] We should give the storage a default value if it's not set in some way.
#27046 commented on
Jun 17, 2025 • 0 new comments -
[State Observability][Log] Allow to ctrl + C when running logs API
#27008 commented on
Jun 17, 2025 • 0 new comments -
[runtime env] local `working_dir` doesn't work with strongly-typed `RuntimeEnv`
#26984 commented on
Jun 17, 2025 • 0 new comments -
[Core][State Observability] More fine-grained exceptions/error codes handling
#26974 commented on
Jun 17, 2025 • 0 new comments -
[Fea 10000 ture] Autoscaler should understand AWS availability and act accordingly
#20774 commented on
Jun 16, 2025 • 0 new comments -
[Core] Typing for .options for Ray Tasks
#26871 commented on
Jun 16, 2025 • 0 new comments -
[State Observability] Support filter None value
#26820 commented on
Jun 16, 2025 • 0 new comments -
[Core] Batch PinObjectIDs requests from Raylet client
#26796 commented on
Jun 16, 2025 • 0 new comments -
[Train] feature request for catboost_ray
#26687 commented on
Jun 16, 2025 • 0 new comments -
[AIR/Tune] Add a `ScalingConfig`-based API to `ResourceChangingScheduler`
#26538 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] CRR and CQL consume more cpus than reported
#26533 commented on
Jun 16, 2025 • 0 new comments -
[SGD] Document best practices for Pipeline epochs
#19323 commented on
Jun 16, 2025 • 0 new comments -
[workflow] scan_prefix with pages/as geneartor
#19234 commented on
Jun 16, 2025 • 0 new comments -
[Core][usability] Improve Ray cluster start up time
#19215 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Don't use `ray.wait()` to drain tracking refs in handle
#19158 commented on
Jun 16, 2025 • 0 new comments -
Unify internal configs & common datastructures
#19152 commented on
Jun 16, 2025 • 0 new comments -
Clean up EndpointState
#19148 commented on
Jun 16, 2025 • 0 new comments -
[Core][Feature] use clang-tidy/format to block usage of std::getenv
#18894 commented on
Jun 16, 2025 • 0 new comments -
[Feature][workflow] Namespace for workflow
#18818 commented on
Jun 16, 2025 • 0 new comments -
[Feature][workflow] Resource limit for workflow job
#18780 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Exception in task leads to truncated error message
#18699 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Logging config is not propagated to driver
#18660 commented on
Jun 16, 2025 • 0 new comments -
Enable copy/paste to get correct command for connecting to Ray client
#18513 commented on
Jun 16, 2025 • 0 new comments -
Ray client suppresses error messages
#18512 commented on
Jun 16, 2025 • 0 new comments -
[serve] Feature request: timeout or max_retries to limit the time spent waiting for a deployment to complete
#18432 commented on
Jun 16, 2025 • 0 new comments -
Add workflow.current_step_uuid() function
#18356 commented on
Jun 16, 2025 • 0 new comments -
[Shuffle] non streaming shuffle 5000 partitions seem to reach the scalability limit
#18333 commented on
Jun 16, 2025 • 0 new comments -
[tune] atari-impala-large.yaml does not finish gracefully
#18325 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] eagerly install for task/actor level
#18160 commented on
Jun 16, 2025 • 0 new comments -
[C++ API] Support cross-lang API with Python/Java
#18149 commented on
Jun 16, 2025 • 0 new comments -
[helm][kubernetes][test] Add formatting tests for Helm chart
#18125 commented on
Jun 16, 2025 • 0 new comments -
[workflows] Better message when not init'ed
#18121 commented on
Jun 16, 2025 • 0 new comments -
resource config is not respected in head_start_ray_commands in cluster.yaml
#18097 commented on
Jun 16, 2025 • 0 new comments -
[Dask-on-Ray] Propagate Dask-on-Ray scheduler config to (rest of) cluster
#17943 commented on
Jun 16, 2025 • 0 new comments -
[core] PlacementGroup should be no op for local_mode=True
#17937 commented on
Jun 16, 2025 • 0 new comments -
[train] fix scalability of `JsonLoggerCallback`
#21416 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [runtime env] [java] select jdk version
#21239 commented on
Jun 16, 2025 • 0 new comments -
[Feature][Tune] Trial status based Stopper
#21222 commented on
Jun 16, 2025 • 0 new comments -
[Train][Tune] Unify Train and Tune Callbacks
#21065 commented on
Jun 16, 2025 • 0 new comments -
[Bug] rsync_filter isn't used in hash_runtime_conf
#20878 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Persistent problems encountered during autoscaling can lead to driver log spam
#20855 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Test placement group chaos testing
#20716 commented on
Jun 16, 2025 • 0 new comments -
[GCP][autoscaler] Scale down is slow and Ray status doesn't show pending nodes
#20695 commented on
Jun 16, 2025 • 0 new comments -
Support snappy compression for spilled objects
#20575 commented on
Jun 16, 2025 • 0 new comments -
Sparse object reads - read part of an object, without downloading the entire object
#20500 commented on
Jun 16, 2025 • 0 new comments -
[core] Scale shuffle to 200+ nodes
#20499 commented on
Jun 16, 2025 • 0 new comments -
Memory-aware task scheduling to avoid OOMs under memory pressure
#20495 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Placement Group] Add timeout mechanism when scheduling placement group
#20477 commented on
Jun 16, 2025 • 0 new comments -
[job submission] Add RAY_ADDRESS or --address to suggested commands for logs/status
#20441 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [Ray Autoscaler] [Core] Ray Worker Node Relaunching during 'ray up'
#20402 commented on
Jun 16, 2025 • 0 new comments -
[workflow] Fail to construct workflow within a workflow
#20381 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Serve] Threading for Ray Serve
#20169 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Serve] Support Sticky Sessions for Stateful Workflows Deployed via Ray Serve
#20107 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Remove filelock dependency
#20083 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Potential deadlock in task scheduling algorithm for placement group resources.
#20051 commented on
Jun 16, 2025 • 0 new comments -
[Feature] rllib + tune metric logging selection
#19816 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] [documentation] clarify postprocess_fn usage in our doc
#19648 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [runtime env] Clean up the command arguments in raylet args
#19448 commented on
Jun 16, 2025 • 0 new comments -
[client] better error message when failing to connect with client
#19371 commented on
Jun 16, 2025 • 0 new comments -
Enhance document on Java API
#17820 commented on
Jun 16, 2025 • 0 new comments -
Example for tuning layer count, dropout probabilities with Transformers
#16340 commented on
Jun 16, 2025 • 0 new comments -
Ray started in local mode doesn't restore environment variables after shutdown
#16132 commented on
Jun 16, 2025 • 0 new comments -
[core] ray.remote hides the docstring of the decorated class
#15877 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] support rsync option `--include`
#15859 commented on
Jun 16, 2025 • 0 new comments -
[Placement Group] The bundle_reservation_check_func breaks load code from local
#15840 commented on
Jun 16, 2025 • 0 new comments -
Contributor docs don't mention running tests via bazel
#15833 commented on
Jun 16, 2025 • 0 new comments -
[docs] should actor methods always have num_returns value?
#15818 commented on
Jun 16, 2025 • 0 new comments -
[rfc] Support `ray[aws,gcp,azure]` as an install target
#15725 commented on
Jun 16, 2025 • 0 new comments -
[rllib] Error while using "count_steps_by": "agent_steps" and misleading documentation
#15708 commented on
Jun 16, 2025 • 0 new comments -
Ray duplicate data from GPU to CPU when placing an actor on GPU
#15692 commented on
Jun 16, 2025 • 0 new comments -
[kubernetes] ModuleNotFoundError when executing a task on a remote cluster
#15668 commented on
Jun 16, 2025 • 0 new comments -
[cross_language] Support Python dictionaries
#15569 commented on
Jun 16, 2025 • 0 new comments -
[core] detached actor logs are not streamed to successive clients
#15549 commented on
Jun 16, 2025 • 0 new comments -
Serve Deployment with Reload Option
#15505 commented on
Jun 16, 2025 • 0 new comments -
[client][core] Have Unified `register_serializer` interface
#15486 commented on
Jun 16, 2025 • 0 new comments -
[Job submission] Monitor driver
#15480 commented on
Jun 16, 2025 • 0 new comments -
[Job submission] Java support
#15479 commented on
Jun 16, 2025 • 0 new comments -
[Job submission] Basic drop job feature
#15478 commented on
Jun 16, 2025 • 0 new comments -
Ray memory size and object store size not correct on k8s
#15463 commented on
Jun 16, 2025 • 0 new comments -
Ray status not report correctly after node crashed
#15459 commented on
Jun 16, 2025 • 0 new comments -
Async actor method hang
#15437 commented on
Jun 16, 2025 • 0 new comments -
[client] python packages version mismatch fail silently
#15407 commented on
Jun 16, 2025 • 0 new comments -
[cluster] Make node_ip_address work throughout
#15239 commented on
Jun 16, 2025 • 0 new comments -
[tune] unify run() and run_experiments()
#8127 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Remove the spilled directory upon Sigterm for ray start
#17790 commented on
Jun 16, 2025 • 0 new comments -
[C++ API] Support non-global named actor
#17734 commented on
Jun 16, 2025 • 0 new comments -
Cleanup stats/metrics.h
#17679 commented on
Jun 16, 2025 • 0 new comments -
workflow cli to manage all jobs
#17672 commented on
Jun 16, 2025 • 0 new comments -
[docs] Tutorial on Pytorch Lightning needs rearranging
#17611 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Helper functions that are written below the actor class don't work
#17590 commented on
Jun 16, 2025 • 0 new comments -
Fix circular dependence in workflow's code
#17445 commented on
Jun 16, 2025 • 0 new comments -
[lineage] Support lineage reconstruction for borrowed ObjectRefs
#17380 commented on
Jun 16, 2025 • 0 new comments -
Errors during scaling cluster
#17292 commented on
Jun 16, 2025 • 0 new comments -
Trial is being repeated with the exact same results
#17257 commented on
Jun 16, 2025 • 0 new comments -
[RFC][Placement groups] Allow tasks to acquire resources in addition to placement group bundle
#17229 commented on
Jun 16, 2025 • 0 new comments -
runtime env in workflow
#16992 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler][core] Safe node termination
#16975 commented on
Jun 16, 2025 • 0 new comments -
[Ray Client] [Usability] Help users spot bandwidth bounded workload
#16966 commented on
Jun 16, 2025 • 0 new comments -
[docker][Clusters][autoscaler][local] Can't connect to cluster when using docker with ray cluster launcher
#16961 commented on
Jun 16, 2025 • 0 new comments -
[cli] Support redis password for all ray commands
#16921 commented on
Jun 16, 2025 • 0 new comments -
[Core] [runtime env] Use portable hash function for runtime_env_hash
#16821 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Support rescheduling tasks when runtime env creation failed.
#16800 commented on
Jun 16, 2025 • 0 new comments -
Priority scheduling of jobs
#16782 commented on
Jun 16, 2025 • 0 new comments -
[C++ API] Completed object reference counting support
#16702 commented on
Jun 16, 2025 • 0 new comments -
[Core] Programmatic way to access pending tasks for an actor?
#16641 commented on
Jun 16, 2025 • 0 new comments -
[Core] Erroneous check for size_t underflow
#16626 commented on
Jun 16, 2025 • 0 new comments -
[Core] Standardize Timestamps across codebase
#16510 commented on
Jun 16, 2025 • 0 new comments -
[test][MLDataset] Fix test_from_modin
#16357 commented on
Jun 16, 2025 • 0 new comments -
[doc] Update instructions for wheel installation
#24533 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Simplex action space shape
#24529 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Make it easy to configure logger level
#24447 commented on
Jun 16, 2025 • 0 new comments -
[tune] improve documentation around "resource exhausted error"
#24439 commented on
Jun 16, 2025 • 0 new comments -
[Core] Unify RegisterClient and AnnounceWorkerPort
#24432 commented on
Jun 16, 2025 • 0 new comments -
[core] Annotation and docstring for ray.remote wrapped functions
#24411 commented on
Jun 16, 2025 • 0 new comments -
[AIR] `Result` object doesn't work with Ray Client
#24396 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Current Implementation of Replay Buffer is not a True Circular Buffer
#24393 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] wrong env step counting when train multi-agent with shared default policy
#24340 commented on
Jun 16, 2025 • 0 new comments -
[Core][observability] Enable observability features built in gRPC
#24327 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler][Docs] Add up-to-date docs on how the autoscaler works.
#24323 commented on
Jun 16, 2025 • 0 new comments -
[Ray Serve Autoscaling] Add release test that checks that nodes scale down when there are no requests
#24315 commented on
Jun 16, 2025 • 0 new comments -
Received message larger than max (105683136 vs. 104857600)
#24286 commented on
Jun 16, 2025 • 0 new comments -
[Serve] [Doc] HTTP Adapters Cookbooks
#24245 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Default DAGDriver implementation cannot serve.run() or serve.build() twice
#24122 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Support functionality to stitch Preprocessor with Keras model
#24023 commented on
Jun 16, 2025 • 0 new comments -
[Core] Log propagation between actor exit called and process terminated
#24020 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: Serve] Improve access by index/key on intermediate result in Serve deployment graph
#23987 commented on
Jun 16, 2025 • 0 new comments -
[Serve] [Docs] Improve architectural diagrams
#23956 commented on
Jun 16, 2025 • 0 new comments -
[Runtime Env] Dependency Installation private git repositories via ssh
#23768 commented on
Jun 16, 2025 • 0 new comments -
[ray client] ray.wait timeout is not respected when connection is interrupted
#23694 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [Tune] Trial-wise dependencies
#23654 commented on
Jun 16, 2025 • 0 new comments -
[Bug] `policies_to_train` throws incorrect/confusing error message when passed an empty list.
#23646 commented on
Jun 16, 2025 • 0 new comments -
[Feature] support of complicated action space in QMix algorithm in Rllib.
#23634 commented on
Jun 16, 2025 • 0 new comments -
[AIR] MLflow integration polish
#25156 commented on
Jun 16, 2025 • 0 new comments -
[AIR] TensorFlow warns to use `distribute.MultiWorkerMirroredStrategy` when I'm already using it
#25140 commented on
Jun 16, 2025 • 0 new comments -
[air] Have a default column for not frequent enough categories for OHE
#25096 commented on
Jun 16, 2025 • 0 new comments -
[Core] Make NodeManager unit testable
#25095 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Improve logging for train
#25088 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Hope RLlib can support DQfD & POfD
#25058 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Support postprocessing in Predictors
#24979 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Add a `TorchVision` preprocessor
#24976 commented on
Jun 16, 2025 • 0 new comments -
[AIR/Train] Torch: Automatically unpack model when checkpointing state dicts
#24975 commented on
Jun 16, 2025 • 0 new comments -
[AIR/Train] Automatically return the framework specific dataset in `train_loop_per_worker`
#24974 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: RLlib>] ppo error when not using critic
#24907 commented on
Jun 16, 2025 • 0 new comments -
[RLlib]: Add tabular models to ModelV2
#24882 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Error when converting GYM Robotics env to Multi-agent Env with the make_multi_agent wrapper
#24881 commented on
Jun 16, 2025 • 0 new comments -
[tune] SigOptSearch suggester is not serialisable
#24864 commented on
Jun 16, 2025 • 0 new comments -
[core] Add basic metrics for lineage reconstruction
#24855 commented on
Jun 16, 2025 • 0 new comments -
[Core] Enhance runtime env state when `ray list runtime-env` is used.
#24838 commented on
Jun 16, 2025 • 0 new comments -
[Core] Refactor Ray memory codepath to follow same pattern as `ray list tasks`.
#24836 commented on
Jun 16, 2025 • 0 new comments -
[Core] Reach parity of task status for `ray memory` and `ray list tasks`
#24835 commented on
Jun 16, 2025 • 0 new comments -
[Tune] `MedianStoppingRule` mishandles `nan`s
#24809 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Simplify json_serde of deployment graph
#24620 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Metrics not reported with Client/Server and env=None
#24601 commented on
Jun 16, 2025 • 0 new comments -
[Core] In C++, there are D_GLIBCXX_USE_CXX11_ABI settings conflicts when both Ray and Arrow are used.
#24566 commented on
Jun 16, 2025 • 0 new comments -
[Serve] `Deployment.url` not updated after options changing name or prefix.
#24548 commented on
Jun 16, 2025 • 0 new comments -
[Rllib] Lack validation for "num_workers" parameter in DDPGTrainer.
#24536 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Deflake `test_runtime_env_working_dir_2`
#23569 commented on
Jun 16, 2025 • 0 new comments -
Enhance state notification pattern in Ray pubsub
#22340 commented on
Jun 16, 2025 • 0 new comments -
[Core] Avoiding subscribing to all logs by each log subscriber
#22274 commented on
Jun 16, 2025 • 0 new comments -
10000
[Train] TPU support
#22251 commented on
Jun 16, 2025 • 0 new comments -
[train] support per epoch shuffling with `prepare_dataloader`
#22108 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Refactor `pip` protobuf to store a single str (`requirements.txt` contents) instead of list of "packages"
#22097 commented on
Jun 16, 2025 • 0 new comments -
[runtime_env] Remove `.lock` files after URI garbage collection
#22062 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Use LRU cache for URIs instead of random eviction
#22060 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Use single URI for `py_modules` field
#22059 commented on
Jun 16, 2025 • 0 new comments -
[Train] Add callback preprocessor that smoothly tracks values
#21989 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Policy - ActionDistribution Type
#21973 commented on
Jun 16, 2025 • 0 new comments -
[runtiime env] Use coroutine to create runtime envs in `runtime_env_agent`
#21950 commented on
Jun 16, 2025 • 0 new comments -
[Train] Add support for Bagua
#21934 commented on
Jun 16, 2025 • 0 new comments -
[Bug] "The kernel has died..." during Ray tune.run
#21917 commented on
Jun 16, 2025 • 0 new comments -
[Jobs] Backwards compatibility tests for REST API
#21915 commented on
Jun 16, 2025 • 0 new comments -
[Jobs] Make jobs work out-of-the-box with cluster YAML
#21911 commented on
Jun 16, 2025 • 0 new comments -
[Train] Support for averaging results
#21849 commented on
Jun 16, 2025 • 0 new comments -
AttributeError raised when using response_model in FastAPI route decorator
#21744 commented on
Jun 16, 2025 • 0 new comments -
[Feature] [runtime env] [C++] support a strong-typed API in C++
#21733 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Cross-language runtime env
#21731 commented on
Jun 16, 2025 • 0 new comments -
[Testing] multi fake node set up doesn't work under non ray client mode
#21653 commented on
Jun 16, 2025 • 0 new comments -
[Bug] "Sent message larger than max" error with dask
#21601 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Can we avoid merging two runtime envs?
#21494 commented on
Jun 16, 2025 • 0 new comments -
[Tune] [Bug] Ray checkpoint sync can sometimes fail to upload checkpoints to s3, plus log spew about sync client observed
#21469 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] raise exception for unsupported runtime_env features on Windows
#21435 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] [Feature] Make Internal KV operations async
#23567 commented on
Jun 16, 2025 • 0 new comments -
[Feature] .bind() on function does not take pre-bind value from upstream DAGNode
#23511 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][Bug] RLLib Dreamer tuned example requesting unreasonable amount of GPU memory
#23479 commented on
Jun 16, 2025 • 0 new comments -
[Core] Add a warning message if options / arguments differ for Actor.options(get_if_exists=True)
#23455 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][Feature] Feature Importance Plots
#23447 commented on
Jun 16, 2025 • 0 new comments -
[air] If you kill train via control C, a bunch of random error messages show up next time you run Train.
#23431 commented on
Jun 16, 2025 • 0 new comments -
[air] Logging message is not relevant to user
#23430 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][docs] Adding more flow charts to RLlib components docs
#23393 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Warn user if pip check fails
#23335 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Refactor packaging code
#23257 commented on
Jun 16, 2025 • 0 new comments -
[Feature] Cleanup current use of `other_args_to_resolve` that passes deployment object into ClassNode
#23243 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Improve tracking of URI size
#23186 commented on
Jun 16, 2025 • 0 new comments -
[updater][Bug] update fails on preempted node and autoscaler stops scheduling
#23182 commented on
Jun 16, 2025 • 0 new comments -
Pipeline ingress requires trailing /
#23048 commented on
Jun 16, 2025 • 0 new comments -
Shouldn't require `PipelineInputNode` to build a pipeline DAG
#23037 commented on
Jun 16, 2025 • 0 new comments -
Pipeline DAG sanity check for model wrappers fields
#23019 commented on
Jun 16, 2025 • 0 new comments -
Pipeline doesn't accept importable class as arguments
#23016 commented on
Jun 16, 2025 • 0 new comments -
[tune][Bug] 'tune.report( mean_accuracy=sklearn.metrics.accuracy_score(test_y, pred_labels), done=True)'where can i get the mean_accuary result?
#22992 commented on
Jun 16, 2025 • 0 new comments -
[Train] add logging to `finish_training` for existing `Callback`s
#22754 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [serve] Accessing shared objects within a deployment
#22751 commented on
Jun 16, 2025 • 0 new comments -
[Feature] Client version check on commit
#22675 commented on
Jun 16, 2025 • 0 new comments -
[Jobs] run all doc examples in CI
#22487 commented on
Jun 16, 2025 • 0 new comments -
Some tests misusing assertTrue for comparisons
#22395 commented on
Jun 16, 2025 • 0 new comments -
[Enhancement][client] Move synchronous GetObject calls to datapath
#22357 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] ignore reinit error when getting dashboard url
#40545 commented on
Jun 19, 2025 • 0 new comments -
Update pettingzoo_env.py
#39431 commented on
Jun 17, 2025 • 0 new comments -
[ci] remove is_automated_build in setup.py
#36547 commented on
Jun 16, 2025 • 0 new comments -
[RLLib][Air] MLFlow parsing of RLLib evaluation and custom metrics
#26711 commented on
Jun 17, 2025 • 0 new comments -
[Core] Allow task retry for `ray.cancel`
#26254 commented on
May 29, 2025 • 0 new comments -
[RLlib] Fix Issue #25316: unconfigurable `dist_dim` for custom multi-action distributions
#25490 commented on
May 29, 2025 • 0 new comments -
[Core]Can’t connect to ray cluster when passing `runtime_env` to `ray.init`
#44757 commented on
Jun 28, 2025 • 0 new comments -
CI test linux://rllib:examples/connectors/flatten_observations_dict_space_impala is flaky
#49754 commented on
Jun 28, 2025 • 0 new comments -
CI test windows://python/ray/tests:test_object_store_metrics is flaky
#49514 commented on
Jun 28, 2025 • 0 new comments -
[ray|llm] ray lora DiskMultiplexConfig loss load from local path to disk_cache
#53315 commented on
Jun 28, 2025 • 0 new comments -
[core] Implement runtime plugins for additional package managers (mamba, micromamba, pixi, etc.)
#45572 commented on
Jun 27, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_multi_agent_pendulum_sac_multi_cpu is flaky
#47264 commented on
Jun 27, 2025 • 0 new comments -
[Serve] DeploymentResponse._to_object_ref() blocks untill final results from actor
#46893 commented on
Jun 27, 2025 • 0 new comments -
[core] Get IP Address of Actor
#7431 commented on
Jun 27, 2025 • 0 new comments -
[RFC] [Serve] Custom Scaling
#41135 commented on
Jun 27, 2025 • 0 new comments -
[Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster
#46444 commented on
Jun 27, 2025 • 0 new comments -
[RLlib] `TorchMultiCategorical.to_deterministic()` cannot handle Multi-agent + LSTM case
#52177 commented on
Jun 27, 2025 • 0 new comments -
[core][ray client] fetch_local flag to ray.wait is not respected for ray client
#52401 commented on
Jun 26, 2025 • 0 new comments -
Check failed: WarmupStore() when starting process
#53094 commented on
Jun 26, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_multi_agent_cartpole_ppo_multi_gpu is flaky
#46226 commented on
Jun 26, 2025 • 0 new comments -
[Core] Ray Label Selector API Implementation Tracker
#51564 commented on
Jun 26, 2025 • 0 new comments -
[Ray serve] StopAsyncIteration error thrown by ray when the client cancels the request
#51598 commented on
Jun 25, 2025 • 0 new comments -
TypeError: Descriptors cannot not be created directly.
#36417 commented on
Jun 25, 2025 • 0 new comments -
[Autoscaler, data] Ray starts `AutoscalingRequester` even when using `enableInTreeAutoscaling`
#51559 commented on
Jun 25, 2025 • 0 new comments -
[Fix][Core] Periodically check log message queue cleared before shutdown
#49337 commented on
Jun 25, 2025 • 0 new comments -
[WIP] Remove VM cluster autoscaler docker implementation
#49238 commented on
Jun 17, 2025 • 0 new comments -
Update azure.md - Missing azure dependency
#49104 commented on
Jun 26, 2025 • 0 new comments -
[train] Make dataset argument covariant
#48999 commented on
Jun 19, 2025 • 0 new comments -
[Build][Deps] Add new `ray[azure]` extra package
#48847 commented on
Jun 25, 2025 • 0 new comments -
[Fix][GCS] Implement reconnection for RedisContext
#48781 commented on
Jun 25, 2025 • 0 new comments -
[Core]: Fix ConnectionError on Autoscaler CR lookups in K8s clusters …
#48675 commented on
Jun 22, 2025 • 0 new comments -
[runtime env]: Integrating ROCm Systems Profiler to Ray worker process
#48525 commented on
Jun 19, 2025 • 0 new comments -
Fix invalid type for progress_reporter parameter of RunConfig
#48439 commented on
Jun 19, 2025 • 0 new comments -
[doc] fix: Typo and missing import in doc
#48311 commented on
Jun 19, 2025 • 0 new comments -
[WIP][core] C++20 upgrade
#48044 commented on
Jun 19, 2025 • 0 new comments -
:bug: do not modify user-provided runtime_env
#48021 commented on
Jun 22, 2025 • 0 new comments -
[Data] Fix parallelism deriving heuristic to ensure parallelism stays w/in min/max bounds
#47695 commented on
Jun 28, 2025 • 0 new comments -
[bazel] move python rules up
#47260 commented on
Jun 27, 2025 • 0 new comments -
Fix mlflow artifact logging
#46570 commented on
Jun 25, 2025 • 0 new comments -
[URL] Change the absolute path to a relative path to solve the ingres…
#45933 commented on
Jun 24, 2025 • 0 new comments -
Fix malformed `temp_dir` path when connecting Windows workers to cluster with Linux head
#45930 commented on
Jun 25, 2025 • 0 new comments -
Enable setting OS disk size in Azure
#45867 commented on
Jun 25, 2025 • 0 new comments -
blind try on ubuntu upgrade ..
#45427 commented on
Jun 16, 2025 • 0 new comments -
[core] object store data transfer zstd
#44755 commented on
Jun 17, 2025 • 0 new comments -
Ray IPv6 support
#44252 commented on
Jun 19, 2025 • 0 new comments -
remove flaky marker from test
#44033 commented on
Jun 26, 2025 • 0 new comments -
verify windows wheels.
#43442 commented on
Jun 24, 2025 • 0 new comments -
Adapt the joblib backend for compatibility with `return_as=generator`
#41028 commented on
Jun 16, 2025 • 0 new comments -
[core] Ray fails to reuse GPU to create new actor when CUDA_VISIBLE_DEVICES is set
#44821 commented on
Jun 25, 2025 • 0 new comments -
[data] Optimize Dataset.unique()
#38764 commented on
Jun 19, 2025 • 0 new comments -
[core][gpu-objects] Actor sends the same ObjectRef twice to another actor
#51273 commented on
Jun 19, 2025 • 0 new comments -
[Autoscaler] Improve NodeProvider interface, make it easier to extend it to cluster managers (e.g. Fargate)
#25134 commented on
Jun 19, 2025 • 0 new comments -
[RLlib] Observation space with 2 dimensions not working with the new API stack
#46631 commented on
Jun 19, 2025 • 0 new comments -
[Ray Client] - Client server failed with runtime_env container
#29852 commented on
Jun 19, 2025 • 0 new comments -
[Ray Train] XGBoostTrainer crashes with ActorDiedError when using num_workers > 1 and use_gpu=False
#53123 commented on
Jun 19, 2025 • 0 new comments -
[Serve] Allow HTTPs Options in Ray Serve
#26814 commented on
Jun 19, 2025 • 0 new comments -
10000 [Core] Make Ray Core tasks/actors metrics counters (accumulators)
#47522 commented on
Jun 18, 2025 • 0 new comments -
[RLlib]
#52683 commented on
Jun 18, 2025 • 0 new comments -
Incorrect default value of CUBLAS_WORKSPACE_CONFIG
#47690 commented on
Jun 18, 2025 • 0 new comments -
[Serve] make various default values of `AutoscalingConfig.max_replicas` consistent and >1
#50222 commented on
Jun 18, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_stateless_cartpole_appo_gpu is flaky
#47295 commented on
Jun 18, 2025 • 0 new comments -
[core][compiled graph] Support all-to-one collective ops (e.g. reduce)
#49324 commented on
Jun 18, 2025 • 0 new comments -
[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml
#14551 commented on
Jun 18, 2025 • 0 new comments -
[Serve] `fastapi_app` is still mutable in the deployment constructor after being passed to `@serve.ingress`
#52775 commented on
Jun 17, 2025 • 0 new comments -
[tune] `URI has empty scheme` error when `storage_path` in `RunConfig` is relative
#42969 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Specify different images for each deployment
#52994 commented on
Jun 17, 2025 • 0 new comments -
[Conda] Ray should raise exception when ray is not installed in conda environment
#52672 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Allow --metrics-export-port argument in "serve run" CLI command
#44426 commented on
Jun 17, 2025 • 0 new comments -
[data] verbose_progress=True doesn't work in client mode
#43200 commented on
Jun 17, 2025 • 0 new comments -
[data] importing ray.data closes logging handlers, breaking custom logging
#48846 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] TorchDistributionWrapper Typing Information Should Be Changed
#33997 commented on
Jun 17, 2025 • 0 new comments -
[Core] DecodeError when `ray.put` a large (2GB) object
#35976 commented on
Jun 17, 2025 • 0 new comments -
[core][gpu-objects] Allow tensor metadata to be specified ahead of time for improved performance
#51279 commented on
Jun 17, 2025 • 0 new comments -
[Core] [Observability] Add PID to structured logs
#52840 commented on
Jun 25, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_cartpole_dqn_multi_cpu is flaky
#47214 commented on
Jun 24, 2025 • 0 new comments -
[Azure] Ray up for Azure fails
#48976 commented on
Jun 24, 2025 • 0 new comments -
[Data] Aggregation is doing internal conversions that breaks on list-like AggType
#52257 commented on
Jun 24, 2025 • 0 new comments -
Ray Serve Replica Initialization Timeout: STDOUT "Failed to load", RequestCancelledError, Likely Due to Slow/Crashing RLModule.from_checkpoint()
#53079 commented on
Jun 24, 2025 • 0 new comments -
StreamSplitDataIterator(epoch=-1, split=0) blocked waiting on other clients for more than 30s.
#42008 commented on
Jun 24, 2025 • 0 new comments -
Clusters (AWS) - SSH Access to head node via AWS Session Manager
#38885 commented on
Jun 24, 2025 • 0 new comments -
[Core] Submitted containerized job is stuck in pending mode
#37293 commented on
Jun 24, 2025 • 0 new comments -
[Autoscaler][v1] Autoscaler launches extra nodes despite fulfilled resource demand
#52864 commented on
Jun 24, 2025 • 0 new comments -
[RFC] GPU object store support in Ray Core
#51173 commented on
Jun 23, 2025 • 0 new comments -
[Data] `dataset.write_iceberg` error
#52967 commented on
Jun 23, 2025 • 0 new comments -
[Dashboard] A button to shut down the ray cluster from the dashboard UI
#29208 commented on
Jun 23, 2025 • 0 new comments -
[Core] Support setting options to the pip install command
#52679 commented on
Jun 23, 2025 • 0 new comments -
Ray kill actor API is a GET request
#18411 commented on
Jun 23, 2025 • 0 new comments -
[Core] ux issues of ray state cli for tasks
#30805 commented on
Jun 23, 2025 • 0 new comments -
[VM launcher] Document how to set up the cluster when there is UFW firewall
#35254 commented on
Jun 23, 2025 • 0 new comments -
[serve][dashboard] Show last line instead of first line in Serve app status message
#35600 commented on
Jun 23, 2025 • 0 new comments -
[core][gpu-objects] Support streaming to overlap computation / communication
#51643 commented on
Jun 23, 2025 • 0 new comments -
[Serve] Make replica scheduler backoff configurable
#52871 commented on
Jun 21, 2025 • 0 new comments -
[Core] BUG: Cluster crashes when using temp_dir "could not connect to socket" raylet.x [since 2.7+]
#44431 commented on
Jun 20, 2025 • 0 new comments -
[Ray debugger] Unable to use debugger on slurm cluster
#51157 commented on
Jun 20, 2025 • 0 new comments -
[core][gpu-objects] Ability to register custom types for GPU data
#52340 commented on
Jun 20, 2025 • 0 new comments -
[data] Bad error message when function outputs cannot be pickled
#46642 commented on
Jun 19, 2025 • 0 new comments -
[data] ObjectRefs passed to map UDF are not automatically deref'ed
#49207 commented on
Jun 19, 2025 • 0 new comments -
Bump flask-cors from 4.0.0 to 6.0.0 in /python
#53116 commented on
Jun 24, 2025 • 0 new comments -
Avoid AssertError
#53104 commented on
Jun 15, 2025 • 0 new comments -
WIP: Add iter_torch_batches Tensor cache
#53069 commented on
Jun 19, 2025 • 0 new comments -
[RLlib] Unwrap alpha value in cql torch learner
#53047 commented on
Jun 15, 2025 • 0 new comments -
[core] Remove tests that are permanently skipped with old decorator
#53046 commented on
Jun 17, 2025 • 0 new comments -
[core][tests] Add chaos tests to verify the interaction between actor restarts, task retries, and lineage reconstruction
#53021 commented on
Jun 15, 2025 • 0 new comments -
[core] Node manager related cpp cleanup
#52990 commented on
Jun 17, 2025 • 0 new comments -
[Data] Fixing null-safety when converting to `TensorArray`
#52977 commented on
Jun 28, 2025 • 0 new comments -
[core] Use GetResourceLoadRequest as a substitute liveness check
#52971 commented on
Jun 25, 2025 • 0 new comments -
[RLlib; Offline RL] - Use `iter_torch_batches` in learner
#52968 commented on
Jun 20, 2025 • 0 new comments -
[deps] upgrade pandas to always use 2+
10000 #52961 commented on
Jun 26, 2025 • 0 new comments -
[Data] fix write_iceberg error
#52956 commented on
Jun 28, 2025 • 0 new comments -
[core] Add sync get node info to NodeInfoAccessor
#52928 commented on
Jun 26, 2025 • 0 new comments -
[core] Synchronize locations with pinned_at_raylet_id
#52920 commented on
Jun 25, 2025 • 0 new comments -
Train Tests: Use map_batches for image_classification
#52837 commented on
Jun 24, 2025 • 0 new comments -
[Data] remove empty lance read tasks
#52831 commented on
Jun 15, 2025 • 0 new comments -
Bump minimum pyarrow version to 17
#52820 commented on
Jun 28, 2025 • 0 new comments -
[ci] try running cicd unit tests in forge env
#52792 commented on
Jun 27, 2025 • 0 new comments -
[core][refactor] Move `to_resubmit_` from CoreWorker to TaskManager to avoid an abstraction leak
#52779 commented on
Jun 25, 2025 • 0 new comments -
[core] Remove small task output copy on task execution path
#52778 commented on
Jun 24, 2025 • 0 new comments -
[core] Remove copy when receiving small object returns
#52777 commented on
Jun 24, 2025 • 0 new comments -
[Core][Refactor] Create separate RPCs for cancelling prepared PG bundle and removing PG
#52751 commented on
Jun 24, 2025 • 0 new comments -
[core] Minor pull manager cleanup
#52724 commented on
Jun 24, 2025 • 0 new comments -
check if ray is installed when using conda env
#52677 commented on
Jun 25, 2025 • 0 new comments -
Filter out ANSI escape codes from logs when retrieving logs from the dashboard
#53370 commented on
Jun 19, 2025 • 0 new comments -
fix: Type of AlgorithmConfig.training(learner_connector
#53369 commented on
Jun 25, 2025 • 0 new comments -
[core] Cleanup plasma client and object manager
#53357 commented on
Jun 19, 2025 • 0 new comments -
[WIP] Fix daft test
#53338 commented on
Jun 24, 2025 • 0 new comments -
[core] Avoid making rpc for local GetLocationFromOwner
#53322 commented on
Jun 17, 2025 • 0 new comments -
Omar/kuberay anyscale
#53318 commented on
Jun 17, 2025 • 0 new comments -
[data] Add GroupedData.random_sample() for group-wise sampling
#53313 commented on
Jun 28, 2025 • 0 new comments -
[core] Core worker get cv - notify after unlock
#53311 commented on
Jun 17, 2025 • 0 new comments -
Make core worker testable
#53299 commented on
Jun 19, 2025 • 0 new comments -
[core][autoscaler][v1] drop object_store_memory from ResourceDemandScheduler._update_node_resources_from_runtime
#53283 commented on
Jun 18, 2025 • 0 new comments -
Bump tornado from 6.1 to 6.5.1 in /python
#53274 commented on
Jun 24, 2025 • 0 new comments -
Override Autoscaler
#53245 commented on
Jun 17, 2025 • 0 new comments -
[data] add explain interface for dataset
#53235 commented on
Jun 28, 2025 • 0 new comments -
[data] New landing page with better examples that show key workloads
#53228 commented on
Jun 28, 2025 • 0 new comments -
[core] enable -Wshadow for all c++ targets
#53194 commented on
Jun 17, 2025 • 0 new comments -
[core] Returning a useful message when trying to get logs for a job that has not started yet
#53174 commented on
Jun 17, 2025 • 0 new comments -
[draft] Submit Ray release test as RayJob to Kuberay GKE
#53165 commented on
Jun 20, 2025 • 0 new comments -
[data] fix lance count_rows not support filter
#53162 commented on
Jun 28, 2025 • 0 new comments -
[docs] updating broken links on rllib torch doc
#53161 commented on
Jun 26, 2025 • 0 new comments -
[core] Don't try to monitor zipped files
#53151 commented on
Jun 19, 2025 • 0 new comments -
Make vllm_engine a deployment
#53139 commented on
Jun 17, 2025 • 0 new comments -
Fix broken Ray Workflows documentation link in README.rst
#53136 commented on
Jun 17, 2025 • 0 new comments -
[data] fix lance dataset schema
#53134 commented on
Jun 28, 2025 • 0 new comments -
macos wheel build debug
#53119 commented on
Jun 25, 2025 • 0 new comments -
[Dashboard] Allow getting dashboard URL via RuntimeContext
#52676 commented on
Jun 25, 2025 • 0 new comments -
[core] Get cloud provider with ray on kubernetes
#51793 commented on
Jun 23, 2025 • 0 new comments -
[core] Remove object store runner
#51766 commented on
Jun 23, 2025 • 0 new comments -
[Core] Native CPU affinity support for accelerators
#51719 commented on
Jun 27, 2025 • 0 new comments -
[core] Lazily subscribe to node changes from workers
#51718 commented on
Jun 23, 2025 • 0 new comments -
windows dev setup
#51678 commented on
Jun 24, 2025 • 0 new comments -
[Docs][wip] Feature: adopt llms.txt convention
#51605 commented on
Jun 26, 2025 • 0 new comments -
[Core] Cover cpplint for ray/src/ray/common
#51551 commented on
Jun 24, 2025 • 0 new comments -
[Dashboard] Support reporting AMD GPU usage
#51345 commented on
Jun 27, 2025 • 0 new comments -
[CI] Replace `black` with `ruff format`
#51332 commented on
Jun 25, 2025 • 0 new comments -
[core] Always create a default executor
#51058 commented on
Jun 23, 2025 • 0 new comments -
[doc] add jax example
#51040 commented on
Jun 22, 2025 • 0 new comments -
Suppress type error
#50994 commented on
Jun 23, 2025 • 0 new comments -
fix restore BUG "RuntimeError: Expected scalars to be on CPU, got cud…
#50983 commented on
Jun 22, 2025 • 0 new comments -
[core] Cover cpplint for ray/src/ray/stats
#50678 commented on
Jun 23, 2025 • 0 new comments -
[Core] Split stats_metric into smaller targets to improve build performance
#50595 commented on
Jun 23, 2025 • 0 new comments -
[RLlib] Enable spliting and zero padding of Dict observation
#50589 commented on
Jun 22, 2025 • 0 new comments -
[Autoscaler][V2] Use running node instances to rate-limit upscaling
#50414 commented on
Jun 22, 2025 • 0 new comments -
Update multi-agent-envs.rst
#50075 commented on
Jun 22, 2025 • 0 new comments -
[core] Thread-safe gcs node manager
#50024 commented on
Jun 22, 2025 • 0 new comments -
[DATA]Add custom resources in data autoscaling
#49756 commented on
Jun 28, 2025 • 0 new comments -
[core] Don't get dashboard address after each dashboard connection failure
#49584 commented on
Jun 22, 2025 • 0 new comments -
[core][cgraph] Use cv instead of busy wait for next version
#49542 commented on
Jun 23, 2025 • 0 new comments -
[core][cgraph] Use threadpool and one io_context for mutable object provider
#49500 commented on
Jun 22, 2025 • 0 new comments -
[core] [easy] readability improvements for IO Workers
#52590 commented on
Jun 26, 2025 • 0 new comments -
[Dashboard] Add Worker ID column to Worker table in Node detail page
#52581 commented on
Jun 28, 2025 • 0 new comments -
[Data] added XML datasource
#52539 commented on
Jun 17, 2025 • 0 new comments -
[core] Static Priority Scheduling (3/N)
#52506 commented on
Jun 24, 2025 • 0 new comments -
[core] Static Priority scheduling (4/N)
#52489 commented on
Jun 24, 2025 • 0 new comments -
[core] Static Priority scheduling (2/N)
#52465 commented on
Jun 24, 2025 • 0 new comments -
[core] Static Priority Scheduling (1/N)
#52439 commented on
Jun 23, 2025 • 0 new comments -
[Serve] Immediately terminate unscheduled replicas
#52416 commented on
Jun 15, 2025 • 0 new comments -
[core] Minor task manager related improvements
#52294 commented on
Jun 19, 2025 • 0 new comments -
[build] warning when username or homedir include @ character
#52274 commented on
Jun 25, 2025 • 0 new comments -
[train] upgrade tensorflow-datasets
#52195 commented on
Jun 24, 2025 • 0 new comments -
upgrade path to python protobuf 4
#52194 commented on
Jun 24, 2025 • 0 new comments -
[Data,Train] Add helpful errors when running forbidden methods on sharded datasets
#52079 commented on
Jun 27, 2025 • 0 new comments -
[WIP] Ray Data doc updates
#52062 commented on
Jun 27, 2025 • 0 new comments -
[Data] Make `from_items` lineage serializable
#52026 commented on
Jun 28, 2025 • 0 new comments -
[Chore][Dashboard] Move `TrainHead` to `python/ray/train` folder
#52014 commented on
Jun 25, 2025 • 0 new comments -
[Chore][Dashboard] Move DataHead to python/ray/data/ folder
#52013 commented on
Jun 28, 2025 • 0 new comments -
test for raycirun
#52012 commented on
Jun 24, 2025 • 0 new comments -
[Core] Deserialization of PyArrow Extension Arrays by registration of deserializers
#51972 commented on
Jun 20, 2025 • 0 new comments -
[Fix][Core] Fail fast if the dashboard agent fails to launch the HTTP server
#51960 commented on
Jun 25, 2025 • 0 new comments -
Add new autoscaling parameter `aggregation function`
#51905 commented on
Jun 19, 2025 • 0 new comments -
[Data] Fix bug where pandas blocks don't use tensor extension
#51868 commented on
Jun 28, 2025 • 0 new comments -
[core][wip] Trying bzlmod
#51834 commented on
Jun 23, 2025 • 0 new comments -
[core] Remove client call tag
#51817 commented on
Jun 19, 2025 • 0 new comments -
[Runtime Env] Add docstring for public class methods and attributes
#32704 commented on
Jun 17, 2025 • 0 new comments -
[tune] Add suggestions on when `reuse_actor` should be set to false.
#32698 commented on
Jun 17, 2025 • 0 new comments -
[serve] serve run doesn't restart app successfully in some environments
#32633 commented on
Jun 17, 2025 • 0 new comments -
[train] Big performance hit when TensorFlow trainer is not scheduled on head node
#32509 commented on
Jun 17, 2025 • 0 new comments -
[doc][tune] clarify `Stopper`, what is `training_iteration`
#32497 commented on
Jun 17, 2025 • 0 new comments -
[release] update our xgboost release test to catch issues like (see discription)
#32491 commented on
Jun 17, 2025 • 0 new comments -
[Core] The remote function in the worker no longer runs after the head crashes
#32454 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Special __common__ key in MultiAgent batches is not documented
#32399 commented on
Jun 17, 2025 • 0 new comments -
[tune] update how trainable reports result/checkpoint to driver
#32380 commented on
Jun 17, 2025 • 0 new comments -
[Datasets] The projection pushdown cannot work with hive style partitioning file path
#32301 commented on
Jun 17, 2025 • 0 new comments -
[Core][utilization] some anti-pattern that not well supported by Ray core.
#32297 commented on
Jun 17, 2025 • 0 new comments -
[tune/train] Provide actionable error messages for common thirdparty errors
#32232 commented on
Jun 17, 2025 • 0 new comments -
[ci] Mirror external dependenies in CI
#32113 commented on
Jun 17, 2025 • 0 new comments -
[Serve] ValueError: Message ray.serve.ReplicaConfig exceeds maximum protobuf size of 2GB
#32049 commented on
Jun 17, 2025 • 0 new comments -
[CLI] make `ray get-head-ip` and `ray get-worker-ips` work for kuberay clusters when run outside the cluster
#32037 commented on
Jun 17, 2025 • 0 new comments -
Serve build usage of click CLI library conflicts python argparse
#32001 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Version Support in 2.X API
#31928 commented on
Jun 17, 2025 • 0 new comments -
[Train] User exceptions not propagated from remote cluster
#31913 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] AlgorithmConfig() defaults not used by build_sac_model when implementing custom model
#31783 commented on
Jun 17, 2025 • 0 new comments -
[kubernetes/cluster] More guides on deployment
#31623 commented on
Jun 17, 2025 • 0 new comments -
[core][state] ray log supporting regex searching
#31549 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Support NLopt search algorithms
#31492 commented on
Jun 17, 2025 • 0 new comments -
[Rllib] Possible Redudant Code
#31463 commented on
Jun 17, 2025 • 0 new comments -
[aws] ray submit --stop fails on aws
#31380 commented on
Jun 17, 2025 • 0 new comments -
[Ray status] confusing output about gpus and accelerators
#33272 commented on
Jun 17, 2025 • 0 new comments -
[Tune] mlflow logger callback > log_trial_result fail (psycopg2.ProgrammingError) can't adapt type 'numpy.int64'
#33233 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Enhance replica upgrade process.
#33192 commented on
Jun 17, 2025 • 0 new comments -
[air output] Isolate/refactor/improve rllib related progress reporting logic
#33150 commented on
Jun 17, 2025 • 0 new comments -
[Tune][wandb] Report tune experiments as a wandb `sweep`
#33142 commented on
Jun 17, 2025 • 0 new comments -
[AIR][wandb] Add option to track artifact references in wandb if using cloud storage
#33130 commented on
Jun 17, 2025 • 0 new comments -
[AIR][Tune] Add an option in `WandbLoggerCallback` to group wandb runs by config
#33084 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Support external storage for state
#33059 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Use the namespace of context instead of "serve" when the Controller gets all running Actors
#33057 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Specify replicas when scaling down
#33056 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Restart a batch of replicas by Actor names or replica tags
#33055 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Specify a batch of replicas to update their user_config
#33054 commented on
Jun 17, 2025 • 0 new comments -
Ray Core Runtime Environments with tea.xyz
#33049 commented on
Jun 17, 2025 • 0 new comments -
[Ray Tune] Support for continuing training when metrics are only reported from some of the workers
#33042 commented on
Jun 17, 2025 • 0 new comments -
[Data] Cannot get the length of a tf dataset created from `ray_ds.to_tf`
#33004 commented on
Jun 17, 2025 • 0 new comments -
[Data] Include image class id in the returned datasets of `ray.data.read_images()`.
#32989 commented on
Jun 17, 2025 • 0 new comments -
[Datasets] Raise descriptive error if `iter_torch_batches` can't convert data
#32953 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Don't start Serve agent if Serve isn't installed
#32920 commented on
Jun 17, 2025 • 0 new comments -
[Data]: `ds.take()` and `ds.iter_batches()` have unexpected different behavior for pd.Series columns
#32913 commented on
Jun 17, 2025 • 0 new comments -
[Ray: Serve] Model Composition primitives should be part of Serve Core API docs.
#32837 commented on
Jun 17, 2025 • 0 new comments -
[core][state] Task backend : already submitted cancelled task showing up as finished
#32826 commented on
Jun 17, 2025 • 0 new comments -
[AIR][Tune] Make trial checkpoint + artifact upload happen atomically
#32823 commented on
Jun 17, 2025 • 0 new comments -
[Tune] During multi-GPU training (using mp.spawn), ray.tune.report does not take effect.
#32810 commented on
Jun 17, 2025 • 0 new comments -
[Tune] failure when using more than one GPU
#32760 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Avoid insufficient resources warning if cluster is autoscaling
#31292 commented on
Jun 17, 2025 • 0 new comments -
[client][runtime_env] Inconsistent runs on ray client
#30518 commented on
Jun 17, 2025 • 0 new comments -
[Core] ray.exceptions.RaySystemError: System error: buffer source array is read-only
#30505 commented on
Jun 17, 2025 • 0 new comments -
[Core] Access violation on windows 11 when running modin workload
#30493 commented on
Jun 17, 2025 • 0 new comments -
Critic Regularized Regression (CRR) model is getting error with Custom Environment (Offline RL)
#30411 commented on
Jun 17, 2025 • 0 new comments -
[Docs] [Jobs] Add pros and cons of different ways of submitting a job
#30305 commented on
Jun 17, 2025 • 0 new comments -
[air/horovod] horovod distributed worker creation may hang
#30276 commented on
Jun 17, 2025 • 0 new comments -
[<Ray component: Workflow>] module 'ray.workflow' has no attribute 'HTTPListener'
#30248 commented on
Jun 17, 2025 • 0 new comments -
[RLLIB][Torch] numerically unstable + mkl issue in torch.sqrt normc_initializer
#30191 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)
#30164 commented on
Jun 17, 2025 • 0 new comments -
[AIR][Tune] Provide user guide on how to build active learning on AIR
#30157 commented on
Jun 17, 2025 • 0 new comments -
[AIR/Docs] Mention/warn that running a Trainer inside a custom Tune trainable is an anti-pattern
#30153 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] (PPO) algo parameter "lambda_" never gets passed because `AlgorithmConfig` refractored "lambda_" to "lambda"
#30143 commented on
Jun 17, 2025 • 0 new comments -
[Core] Reference leakage somewhere after ray.shutdown()
#30089 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Can't access all metrics for all trials
#30004 commented on
Jun 17, 2025 • 0 new comments -
[core][dashboard] state api on worker nodes can not connect to dashboard url
#29959 commented on
Jun 17, 2025 • 0 new comments -
[Jobs] Include requested and available resources in JobInfo status message
#29921 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Add some metric for aync algos (e.g. APPO) that shows the total number of gradient updates
#29830 commented on
Jun 17, 2025 • 0 new comments -
[AIR] Update pytorch training and prediction benchmark with numpy with updated metrics
#29743 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Undesired memory growing when using convolutional neural network
#29699 commented on
Jun 17, 2025 • 0 new comments -
[AIR] `XGBoostTrainer` gives misleading error if column missing
#29695 commented on
Jun 17, 2025 • 0 new comments -
[RLLib Tests] : Included pytests in package as well as basic commands fail with ValueError
#29691 commented on
Jun 17, 2025 • 0 new comments -
[air] GPU memory leak when using AIR trainer with torch dataloader when the latter uses multi-processing
#29563 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Benchmark bandit methods vs plain Thompson Sampling for a non-contextual MAB
#29528 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] "model": {"free_log_std": True} generates Tensorflow Lambda layers warning with TF2 framework
#29502 commented on
Jun 17, 2025 • 0 new comments -
No worker logs in the dashboard after recreating the K8S Ray pods
#31288 commented on
Jun 17, 2025 • 0 new comments -
[core] Please improve warning message for ip mismatch
#31264 commented on
Jun 17, 2025 • 0 new comments -
[Ray Collective] Ray Collective AllGather is Completely Broken
#31259 commented on
Jun 17, 2025 • 0 new comments -
[core][state] Refactor use of bounded LRU/FIFO buffer/map used in task backend
#31158 commented on
Jun 17, 2025 • 0 new comments -
[core] Ray resources should be case-insensitive
#31087 commented on
Jun 17, 2025 • 0 new comments -
[RayCluster]
#31041 commented on
Jun 17, 2025 • 0 new comments -
[Serve] gRPCis should not allow route_prefix set
#30891 commented on
Jun 17, 2025 • 0 new comments -
[General] Setup a "code walkthrough" meetup or tutorial
#30852 commented on
Jun 17, 2025 • 0 new comments -
[RFC][core] Option to avoid scheduling tasks to nodes with disk full
#30843 commented on
Jun 17, 2025 • 0 new comments -
[core] Enable greater control over log verbosity
#30832 commented on
Jun 17, 2025 • 0 new comments -
[Tune] ability to specify search algorithm when using tune.run_experiments()
#30802 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Deprecate the RLlib spaces that are duplications of gym spaces.
#30800 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Guard against users overriding internal `Trainable` methods
#30795 commented on
Jun 17, 2025 • 0 new comments -
Ray Cluster Resources Issue
#30780 commented on
Jun 17, 2025 • 0 new comments -
[Core] Worker leak
#30731 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Default policy error in two trainer work flow
#30676 commented on
Jun 17, 2025 • 0 new comments -
[core] Can't set working directory for runtime env in actor definition
#30666 commented on
Jun 17, 2025 • 0 new comments -
[Tune] HeboSearch reproducible deterministic results
#30661 commented on
Jun 17, 2025 • 0 new comments -
[core] Memory changes are not as expected when using ray.get()
#30615 commented on
Jun 17, 2025 • 0 new comments -
[Tune] `fail_fast` marks all runs as terminated, making the experiment impossible to restore
#30584 commented on
Jun 17, 2025 • 0 new comments -
[RLLib] Custom model with LSTM causes the auto wrapping to be partially executed
#30581 commented on
Jun 17, 2025 • 0 new comments -
[Core|RayTrain] RuntimeError: Some workers returned results while others didn't
#30545 commented on
Jun 17, 2025 • 0 new comments -
[Core] Overriding the default logging format for Worker logs
#30544 commented on
Jun 17, 2025 • 0 new comments -
[AIR] Canonical way to determine whether the code is running in a Train/Tune session
#30536 commented on
Jun 17, 2025 • 0 new comments -
[VM launcher] Ran `Ray status` after I sshed in to the head node and it printed "No cluster status"
#35017 commented on
Jun 17, 2025 • 0 new comments -
[air/tune][multi-tenancy] Parallel runs can use the same experiment directory
#35006 commented on
Jun 17, 2025 • 0 new comments -
Issue on page /cluster/vms/examples/ml-example.html
#34996 commented on
Jun 17, 2025 • 0 new comments -
[AWS VM Cluster Launcher] AWS Cluster launcher installs nightly Ray by default
#34991 commented on
Jun 17, 2025 • 0 new comments -
[CI] Fix minimal-install python 3.11: build wheel with unsupported tags.
#34980 commented on
Jun 17, 2025 • 0 new comments -
[serve][docs] Add DAG building classes to the API reference
#34953 commented on
Jun 17, 2025 • 0 new comments -
[AIR output] Rich table gets truncated when the terminal height is smaller than it
#34925 commented on
Jun 17, 2025 • 0 new comments -
[AIR output] Format of trial table with Rich enabled.
#34923 commented on
Jun 17, 2025 • 0 new comments -
[AIR output] "iteration" is shown in the output for RL users
#34918 commented on
Jun 17, 2025 • 0 new comments -
[core] ray.kill doesn't guarantee resources are cleaned up
#34917 commented on
Jun 17, 2025 • 0 new comments -
[Data] Add `fn_kwargs` to `BatchMapper`
#34852 commented on
Jun 17, 2025 • 0 new comments -
Resource Allocation: Ray Core, Ray Client
#34816 commented on
Jun 17, 2025 • 0 new comments -
[Jobs] Job agent recovers all running jobs on restart, not just those monitored by that agent
#34794 commented on
Jun 17, 2025 • 0 new comments -
[Doc] Autogenerated "suggest an edit" link doesn't work
#34751 commented on
Jun 17, 2025 • 0 new comments -
[Tune] thread limit resulting in the job failure in multi-tenancy usage
#34745 commented on
Jun 17, 2025 • 0 new comments -
Ray Job
#34710 commented on
Jun 17, 2025 • 0 new comments -
[docs][infra] automate checks for common link errors
#34681 commented on
Jun 17, 2025 • 0 new comments -
[Ray Job] Auto-shutdown of the cluster when job finished
#34672 commented on
Jun 17, 2025 • 0 new comments -
[Core] Ray.wait should return if task throw exception
#34653 commented on
Jun 17, 2025 • 0 new comments -
[Core] ray2.3.1 gcs_server memory keeps increasing until OOM
#34619 commented on
Jun 17, 2025 • 0 new comments -
[Runtime Env/Ray Job] Job submission fails when specifing local zip file as working dir
#34605 commented on
Jun 17, 2025 • 0 new comments -
why ray.data.read_images cat not combine_chunks
#34563 commented on
Jun 17, 2025 • 0 new comments -
[Core] Add support for cancelling descendants of a completed task
#34545 commented on
Jun 17, 2025 • 0 new comments -
[Data] retrieve written paths from `Dataset.write_datasource`
#34444 commented on
Jun 17, 2025 • 0 new comments -
[air/output] Jupyter notebook trial result table keeps swapping column order
#35838 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Make Learner more standalone with regards to LearnerHyperparameters
#35788 commented on
Jun 17, 2025 • 0 new comments -
[AIR] `on_trial_complete` callback hook happens before trial resources are freed
#35721 commented on
Jun 17, 2025 • 0 new comments -
[core] Failed to close sockets in CoreWorker when crash.
#35681 commented on
Jun 17, 2025 • 0 new comments -
Ray Data - Glob/wildcard in file path
#35499 commented on
Jun 17, 2025 • 0 new comments -
[serve] Document how to silence access logs from GradioIngress
#35496 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Windows CLI, cmd.exe, powershell parsing json arguments JSONDecodeError
#35492 commented on
Jun 17, 2025 • 0 new comments -
[RayClient]large object transfer failure
#35448 commented on
Jun 17, 2025 • 0 new comments -
[train] Simplify `test_transformers_trainer_steps::test_e2e_steps`
#35424 commented on
Jun 17, 2025 • 0 new comments -
[Core] Reducing scheduling fragmentation
#35422 commented on
Jun 17, 2025 • 0 new comments -
[Core, RLlib] Multi GPU RLlib experiment is unable to be scheduled.
#35409 commented on
Jun 17, 2025 • 0 new comments -
[Job] Failed to schedule supervisor actor leads to job failure
#35387 commented on
Jun 17, 2025 • 0 new comments -
[Job] Show submitter of a Job on the dashboard
#35367 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Support sync function for multiplexing
#35356 commented on
Jun 17, 2025 • 0 new comments -
[AIR] [Train] train multiple instances simultaneously on machines with specified tags
#35333 commented on
Jun 17, 2025 • 0 new comments -
<RLlib> What is the cause of the low CPU utilization in rllib PPO?
#35313 commented on
Jun 17, 2025 • 0 new comments -
[Data] Infer the data schema in Ray Datasets
#35230 commented on
Jun 17, 2025 • 0 new comments -
[dashboard] how to adjust ray dashboard refresh rate?
#35156 commented on
Jun 17, 2025 • 0 new comments -
[KubeRay, dashboard] Clarify that the users can use persistent volumes for log_dir and ray dashboard can read from it.
#35137 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Better error handling when return shape from step() mismatch in utils._flatten_multidiscrete
#35113 commented on
Jun 17, 2025 • 0 new comments -
The ray rsync-up cli reports no issue, but actually file is absent on remote side (Ray AWS cluster)
#35051 commented on
Jun 17, 2025 • 0 new comments -
[Core] - GPU Support - Explanation of Results
#35048 commented on
Jun 17, 2025 • 0 new comments -
[Data] Optimize `read_datasource` setup
#35029 commented on
Jun 17, 2025 • 0 new comments -
[EC2 VM Cluster launcher] Document EC2 ssh key limit and workaround
#35020 commented on
Jun 17, 2025 • 0 new comments -
[Docs Infra] [RLLib] Remove "<<<" from code blocks
#34439 commented on
Jun 17, 2025 • 0 new comments -
[Ray AIR] Add more documentation about checkpointing
#33932 commented on
Jun 17, 2025 • 0 new comments -
Ray Workflow
#33844 commented on
Jun 17, 2025 • 0 new comments -
[Train] Intermittent `UnpicklingError` when loading estimator/preprocessor from checkpoint
#33815 commented on
Jun 17, 2025 • 0 new comments -
[AIR output] Warnings for AIR_VERBOSITY is confusing
#33810 commented on
Jun 17, 2025 • 0 new comments -
[air output] Aggregation of feedback for air output v2
#33803 commented on
Jun 17, 2025 • 0 new comments -
[Datasets] `FileBasedDataSource`s do not pass `filesystem` to `_read_stream()` methods' `reader_args`
#33777 commented on
Jun 17, 2025 • 0 new comments -
[Core][Runtime Env] Document how to write custom runtime env plugin
#33746 commented on
Jun 17, 2025 • 0 new comments -
Core: Can the ray core's scheduling mechanism support customized extensions?
#33735 commented on
Jun 17, 2025 • 0 new comments -
[Ray init] Ray init method does not support pathlib.Path
#33672 commented on
Jun 17, 2025 • 0 new comments -
[docs] improve user experience of the API ref
#33645 commented on
Jun 17, 2025 • 0 new comments -
[RLLib] Collecting external experience
#33636 commented on
Jun 17, 2025 • 0 new comments -
[Workflow] get_metadata(workflow_id)["status"] and get_status(workflow_id) not returning the same status
#33633 commented on
Jun 17, 2025 • 0 new comments -
[runtime_env] Actors always depend global `pip` field for `runtime_env`
#33607 commented on
Jun 17, 2025 • 0 new comments -
[Core] Raylet process not respecting `--node-ip-address`
#33554 commented on
Jun 17, 2025 • 0 new comments -
[Tune] Support ExperimentAnalysis.dataframe(mode='mean')
#33540 commented on
Jun 17, 2025 • 0 new comments -
[Train] `RunConfig` doesn't get propagated from the Tuner to the Trainer
#33539 commented on
Jun 17, 2025 • 0 new comments -
[Core] std::bad_alloc error using ray.init()
#33525 commented on
Jun 17, 2025 • 0 new comments -
[Core] `test_memory_deadlock` times out
#33491 commented on
Jun 17, 2025 • 0 new comments -
[Core] Support binding worker processes to NUMA nodes
#33465 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Support for setting `working_dir` to a local directory in `RayService`
#33456 commented on
Jun 17, 2025 • 0 new comments -
RLLIB - RE3 Exploration Algorithm - No GPU support f0r Dynamic TF V2
#33425 commented on
Jun 17, 2025 • 0 new comments -
[client] kubernetes w ray client
#33367 commented on
Jun 17, 2025 • 0 new comments -
[Train] Reporting metrics/checkpoints from multiple workers
#33360 commented on
Jun 17, 2025 • 0 new comments -
[Data] `read_parquet` schema is incorrect (schema is a dict instead of a string)
#33279 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Production Guide: Add instruction for non-K8s on-premise clusters
#34437 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Ray Serve hangs and becomes unresponsive when calling ffmpeg in deployment
#34414 commented on
Jun 17, 2025 • 0 new comments -
[Serve] Deployments page tasks history is full of system tasks. Not very useful
#34386 commented on
Jun 17, 2025 • 0 new comments -
[Core] serialisation of dataclass in separate module fails to recognise parameter change in child dataclass, but functions correctly if in the same module
#34366 commented on
Jun 17, 2025 • 0 new comments -
ImportError: cannot import name 'torch' from 'ray.rllib.train'
#34354 commented on
Jun 17, 2025 • 0 new comments -
[core][state] Include job info for placement group
#34333 commented on
Jun 17, 2025 • 0 new comments -
[Jobs] Use new API `is_head_node` to find head node
#34317 commented on
Jun 17, 2025 • 0 new comments -
[Core] RFC: simplify CI testing
#34315 commented on
Jun 17, 2025 • 0 new comments -
[air] Error while loading xgboost model in BatchPredictor
#34307 commented on
Jun 17, 2025 • 0 new comments -
[RLlib] Unity 3d env tests are broken
#34290 commented on
Jun 17, 2025 • 0 new comments -
[air/train] the logic to grab free ports for `tf_config` is potentially racy
#34271 commented on
Jun 17, 2025 • 0 new comments -
[Core][Object Store] Push Manager: round for object manager client and FIFO for object
#34270 commented on
Jun 17, 2025 • 0 new comments -
[air] xgboost/lightgbm trainer's validation result differ between online and offline
#34211 commented on
Jun 17, 2025 • 0 new comments -
[tune] support viewing partial experiment result as tuning goes on
#34207 commented on
Jun 17, 2025 • 0 new comments -
[Workflow] Improve efficiency of Ray Workflow by returning workflow metadata and completed task information in single API call
#34158 commented on
Jun 17, 2025 • 0 new comments -
Issue on page /rllib/package_ref/algorithm.html
#34157 commented on
Jun 17, 2025 • 0 new comments -
[Prometheus metrics util] Application level custom metrics aren't getting exported consistently
#34145 commented on
Jun 17, 2025 • 0 new comments -
[Core] Actors not cleaning up resources correct because `force_kill=true`.
#34124 commented on
Jun 17, 2025 • 0 new comments -
Ray Tune + ray xgboost running out of disk space
#34118 commented on
Jun 17, 2025 • 0 new comments -
[Core][Tune]Trials hang when using Pytorch
#34028 commented on
Jun 17, 2025 • 0 new comments -
[Data] `map_batches` hard to use and debug
#34007 commented on
Jun 17, 2025 • 0 new comments -
[Core] improve garbage collection after job go out of scope
#34001 commented on
Jun 17, 2025 • 0 new comments -
[Core] Timeout for unschedulable task due to unavailable workers
#33954 commented on
Jun 17, 2025 • 0 new comments -
[Observability] Programmatically fetch prometheus metrics
#33940 commented on
Jun 17, 2025 • 0 new comments -
[ui] More metadata for the task timeline
#8050 commented on
Jun 16, 2025 • 0 new comments -
[tune] Support for config to (optionally) be an argparse.Namespace?
#8006 commented on
Jun 16, 2025 • 0 new comments -
[tune] Resource Allocation UX
#7968 commented on
Jun 16, 2025 • 0 new comments -
`pandas has no attribute 'compat'` Deserialization bug when running tasks very rarely
#7879 commented on
Jun 16, 2025 • 0 new comments -
"Lost reference to actor" when returning actor handle from actor
#7815 commented on
Jun 16, 2025 • 0 new comments -
Ray has both ray.util and ray.utils, which is confusing.
#7787 commented on
Jun 16, 2025 • 0 new comments -
Provide more scheduling algorithms for actors/tasks
#7723 commented on
Jun 16, 2025 • 0 new comments -
[ray] Object store shared memory numpy leak in worker loop
#7653 commented on
Jun 16, 2025 • 0 new comments -
Ray processes on slave node become defunct when the head node is restarted/stopped
#7651 commented on
Jun 16, 2025 • 0 new comments -
Relax python version match requirement when joining a cluster
#7648 commented on
Jun 16, 2025 • 0 new comments -
Does ray workers could share the same tf.sess?
#7646 commented on
Jun 16, 2025 • 0 new comments -
About model configuration.
#7644 commented on
Jun 16, 2025 • 0 new comments -
Probable race condition
#7617 commented on
Jun 16, 2025 • 0 new comments -
Recursion with pickling in ray.init with py3.5
#7605 commented on
Jun 16, 2025 • 0 new comments -
Is it possible to create process inside ray Actor?
#7578 commented on
Jun 16, 2025 • 0 new comments -
Why seems getting from local object store not faster than getting from remote object store?
#7575 commented on
Jun 16, 2025 • 0 new comments -
[util.multiprocessing] Unable to pass Queue to pool.apply_async
#7561 commented on
Jun 16, 2025 • 0 new comments -
Keyword arguments should be keyword only arguments in the Ray API
#7548 commented on
Jun 16, 2025 • 0 new comments -
[Pool] About using ray.util.multiprocessing import Pool
#7542 commented on
Jun 16, 2025 • 0 new comments -
Reporting Reward Breakdowns
#7518 commented on
Jun 16, 2025 • 0 new comments -
[config] Introduce a configuration library for unified configuration code
#7485 commented on
Jun 16, 2025 • 0 new comments -
Proper way of calling a class method in another method
#7450 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Provide ability to provide elastic ip when launching cluster
#7446 commented on
Jun 16, 2025 • 0 new comments -
What are system requirements for building on Mac OSX
#7430 commented on
Jun 16, 2025 • 0 new comments -
[core] [docs] use-cases for Ray's async support
#10688 commented on
Jun 16, 2025 • 0 new comments -
Exceptions and ResourceWarnings on ray.init (Jupyter+offline)
#10279 commented on
Jun 16, 2025 • 0 new comments -
Can CPU resource scheduling be scheduled through Cgroup?
#10037 commented on
Jun 16, 2025 • 0 new comments -
Windows debugging on gdb does not work
#9827 commented on
Jun 16, 2025 • 0 new comments -
[util.multiprocessing] Support generators
#9712 commented on
Jun 16, 2025 • 0 new comments -
[Core] A ray.remote flag for nested object ID gathering in task arguments.
#9489 commented on
Jun 16, 2025 • 0 new comments -
[docs] ray up <config.xml> --help does not show help
#9455 commented on
Jun 16, 2025 • 0 new comments -
[docs] Document how to use conda environments with the autoscaler
#9199 commented on
Jun 16, 2025 • 0 new comments -
[ray] Visualize Ray dashboard locally/offline
#9095 commented on
Jun 16, 2025 • 0 new comments -
Confusing RedisError when many threads are used
#9083 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Check failed: _s.ok() Heartbeat failed: NotImplemented
#8883 commented on
Jun 16, 2025 • 0 new comments -
Can't parallelize non-pickable function with initializer in Pool
#8876 commented on
Jun 16, 2025 • 0 new comments -
DQN Minibatch Option
#8870 commented on
Jun 16, 2025 • 0 new comments -
tune: module 'tensorflow' has no attribute __version__ in Ray Trainable since v0.7.7
#8729 commented on
Jun 16, 2025 • 0 new comments -
Blank redis-password gives wrong message to add node
#8629 commented on
Jun 16, 2025 • 0 new comments -
absl.logging inside remote tasks does not get printed
#8625 commented on
Jun 16, 2025 • 0 new comments -
Invalid iterator dereference in TestReconstructionChain (fails in debug mode)
#8587 commented on
Jun 16, 2025 • 0 new comments -
Can't pickle CudnnModule objects
#8569 commented on
Jun 16, 2025 • 0 new comments -
Reducing unnecessary process overhead in practice
#8522 commented on
Jun 16, 2025 • 0 new comments -
[tune]Error in BOHB perhaps caused by different trainable instances running in the same Trial ???
#8455 commented on
Jun 16, 2025 • 0 new comments -
incompatible with 'msgpack_numpy.patch()' function
#8409 commented on
Jun 16, 2025 • 0 new comments -
Error connecting to Redis server at 127.0.0.1:35709
#8389 commented on
Jun 16, 2025 • 0 new comments -
Error while shutting down Ray
#8385 commented on
Jun 16, 2025 • 0 new comments -
[ray] Pyarmor compatibility
#8365 commented on
Jun 16, 2025 • 0 new comments -
[ray] Can RAY pause and continue tasks distributed to the cluster's nodes?
#8263 commented on
Jun 16, 2025 • 0 new comments -
failed on virtualnevironment
#6735 commented on
Jun 16, 2025 • 0 new comments -
Managing memory during long loops
#6717 commented on
Jun 16, 2025 • 0 new comments -
Not able to reproduce speed performance improvements using ray on my machine
#6716 commented on
Jun 16, 2025 • 0 new comments -
[tune] Logs don't sync up to workers on restore
#6702 commented on
Jun 16, 2025 • 0 new comments -
The remote_function.options is not documented.
#6699 commented on
Jun 16, 2025 • 0 new comments -
[tune] More robust checkpoint garbage collection
#6697 commented on
Jun 16, 2025 • 0 new comments -
Fault tolerance to dead actors
#6670 commented on
Jun 16, 2025 • 0 new comments -
ray.wait's num_returns should not fail if num_returns > len(results)
#6667 commented on
Jun 16, 2025 • 0 new comments -
Parallel execution of multiple dataframes by dividing them into sub-frames
#6640 commented on
Jun 16, 2025 • 0 new comments -
Batch Norm example failing under APEX
#6638 commented on
Jun 16, 2025 • 0 new comments -
limiting tensorflow memory failed in actor or function
#6633 commented on
Jun 16, 2025 • 0 new comments -
Remote function is executed in python `exec` with empty local/global will fails
#6620 commented on
Jun 16, 2025 • 0 new comments -
[tune] Estimate timing
#6618 commented on
Jun 16, 2025 • 0 new comments -
[streaming] Add micro batching feature
#6607 commented on
Jun 16, 2025 • 0 new comments -
Package reference should include task & actor APIs
#6566 commented on
Jun 16, 2025 • 0 new comments -
Serialization is 20% slower from 0.7.6 -> 0.7.7
#6551 commented on
Jun 16, 2025 • 0 new comments -
[ray] How to write into numpy arrays in shared memory with Ray?
#6507 commented on
Jun 16, 2025 • 0 new comments -
Support for mxnet.ndarray?
#6494 commented on
Jun 16, 2025 • 0 new comments -
[ray] Handle memory pressure more gracefully
#6458 commented on
Jun 16, 2025 • 0 new comments -
Reloading module changes in workers
#6449 commented on
Jun 16, 2025 • 0 new comments -
[tune] [serve] Don't use daemon threads
#6421 commented on
Jun 16, 2025 • 0 new comments -
Terminal freezes after setting @ray.remote(num_gpu=2)
#6418 commented on
Jun 16, 2025 • 0 new comments -
Ray does not preserve requires_grad attribute
#6405 commented on
Jun 16, 2025 • 0 new comments -
Ray over mpi for supercomputers
#6344 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] Ray Dashboard sometimes auto refreshes to point to wrong job id temporarily.
#45662 commented on
Jun 6, 2025 • 0 new comments -
Ray dashboard integration
#7383 commented on
Jun 16, 2025 • 0 new comments -
Do not suggest calling __ray_terminate__ directly
#7382 commented on
Jun 16, 2025 • 0 new comments -
ray.services.get_node_ip_address doesn't work well if there is a local proxy
#7316 commented on
Jun 16, 2025 • 0 new comments -
Provide abstraction/interface to implement resource isolation for custom resources
#7204 commented on
Jun 16, 2025 • 0 new comments -
[cross-language]Problem about cross language data layout
#7191 commented on
Jun 16, 2025 • 0 new comments -
Documentation for connecting to ray cluster could be improved
#7186 commented on
Jun 16, 2025 • 0 new comments -
ray.experimental.queue is very slow
#7172 commented on
Jun 16, 2025 • 0 new comments -
Using asserts for argument checks is probably a bad idea
#7171 commented on
Jun 16, 2025 • 0 new comments -
Ray Issue: The class state is never hold by passing it to the remote function/actors if the class is defined in separate files
#7160 commented on
Jun 16, 2025 • 0 new comments -
[core] Gets timeout on randomly generated ObjectIDs
#7074 commented on
Jun 16, 2025 • 0 new comments -
Allow remote functions to require running on a fresh worker
#7059 commented on
Jun 16, 2025 • 0 new comments -
How to use Ray with closures?
#7055 commented on
Jun 16, 2025 • 0 new comments -
The project `setup.py` script doesn't install tools needed by `ci/travis/format.sh`
#6999 commented on
Jun 16, 2025 • 0 new comments -
Don't run Java or sanitizer tests when only Python changes.
#6992 commented on
Jun 16, 2025 • 0 new comments -
ray plasma object store connection refused after 24hrs
#6988 commented on
Jun 16, 2025 • 0 new comments -
Sharing in memory
#6976 commented on
Jun 16, 2025 • 0 new comments -
[ray] ray on slurm not respecting memory limits
#6968 commented on
Jun 16, 2025 • 0 new comments -
Unable to override ray's default logging format
#6965 commented on
Jun 16, 2025 • 0 new comments -
MADDPG used onto a MultiEnv does not show learning.
#6949 commented on
Jun 16, 2025 • 0 new comments -
How to throttle process to avoid "UnreconstructableError"
#6892 commented on
Jun 16, 2025 • 0 new comments -
pip install from source requires --editable/-e flag
#6845 commented on
Jun 16, 2025 • 0 new comments -
[scheduling] Default actor lifetime resources (0 CPUs) cause cluster not to be saturated
#6814 commented on
Jun 16, 2025 • 0 new comments -
How to Reduce Memory Usage for Creating Actor?
#6778 commented on
Jun 16, 2025 • 0 new comments -
Reconstruction semantics around failing actor constructor.
#6768 commented on
Jun 16, 2025 • 0 new comments -
[Deploy]Ray on Yarn Deployment
#6753 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [Workflow] ray.wait on workflow result doesn't work as expected
#19295 commented on
Jun 16, 2025 • 0 new comments -
[tune] MLFlowLogger doesn't save artifacts for remote mlflow tracking_uri
#19263 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [XLang] Segfault when Java returns void
#18837 commented on
Jun 16, 2025 • 0 new comments -
[Bug] tensorboardX vs tensorboard?
#18727 commented on
Jun 16, 2025 • 0 new comments -
Dashboard exposes redis PW on the command line
#18491 commented on
Jun 16, 2025 • 0 new comments -
[core] ReferenceCountingAssertionError may be thrown if ObjectRef is passed through intermediate worker that dies
#18456 commented on
Jun 16, 2025 • 0 new comments -
Race condition of grpc backpressure
#18439 commented on
Jun 16, 2025 • 0 new comments -
[Core] Task spec including inlined objects can crash lease request RPCs.
#18194 commented on
Jun 16, 2025 • 0 new comments -
[Runtime Env] Setup process doesn't have CPU limit
#18137 commented on
Jun 16, 2025 • 0 new comments -
ray.init with address crashes process outside of cluster
#17769 commented on
Jun 16, 2025 • 0 new comments -
new dashboard agent port conflict issues
#17498 commented on
Jun 16, 2025 • 0 new comments -
[Core] Unable to get actor handle of global named actor created in java from python in Ray 1.4.0
#16436 commented on
Jun 16, 2025 • 0 new comments -
[serve] java api
#16393 commented on
Jun 16, 2025 • 0 new comments -
[serve] java serve handle
#16392 commented on
Jun 16, 2025 • 0 new comments -
[serve] java http proxy
#16391 commented on
Jun 16, 2025 • 0 new comments -
[Shuffle] non-streaming consumed bytes are too low compared to spilled / restored bytes.
#16149 commented on
Jun 16, 2025 • 0 new comments -
[ray] Multiple concurrent requests to create a named actor crash GCS
#15941 commented on
Jun 16, 2025 • 0 new comments -
Remove unused util functions for conda environments
#15912 commented on
Jun 16, 2025 • 0 new comments -
[core] Zero-gpu node shouldn't be marked with accelerator_type resource.
#15878 commented on
Jun 16, 2025 • 0 new comments -
Cannot using external model with cuda when using ray
#15869 commented on
Jun 16, 2025 • 0 new comments -
[wheel][doc] Make it easier to access Ray wheels for specific commits
#15765 commented on
Jun 16, 2025 • 0 new comments -
[rllib]Update the docs about Variable-length / Parametric Action Space
#15710 commented on
Jun 16, 2025 • 0 new comments -
Odd task scheduling behavior on same node
#15602 commented on
Jun 16, 2025 • 0 new comments -
Averaging learning curves over repetitions + plotting confidence intervals [Tune]
#15400 commented on
Jun 16, 2025 • 0 new comments -
AssertionError when using pyinstaller with ray
#15396 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Better way of grouping metric definitions
#10341 commented on
Jun 16, 2025 • 0 new comments -
[RLLib] Workers died at the initialization stage when the observation space is a 3D shape
#22033 commented on
Jun 16, 2025 • 0 new comments -
[Train] Automatically choose number of workers
#21987 commented on
Jun 16, 2025 • 0 new comments -
[Serve] The adjustment about Ray Serve Java Proxy and Java Replica
#21694 commented on
Jun 16, 2025 • 0 new comments -
[C++] Cluster Mode Tests Should have 1 test per feature tested
#21454 commented on
Jun 16, 2025 • 0 new comments -
[Tune] [Bug] lazily expand directories for client compatibility
#21408 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Issue on page /tune/tutorials/tune-pytorch-lightning.html
#21354 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Got stucked when running python script from a shell script
#21298 commented on
Jun 16, 2025 • 0 new comments -
[Bug] [Tune] pbt run_experiments not stable, some trial will error.
#21259 commented on
Jun 16, 2025 • 0 new comments -
[Train] Document Callbacks
#21066 commented on
Jun 16, 2025 • 0 new comments -
[Feature] Single source of truth for Ray version in Java `pom.xml` and `pom_template.xml` files
#21059 commented on
Jun 16, 2025 • 0 new comments -
[Test Bug] Matching `psutil.Process.name()` doesn't work on macOS
#20982 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Incorrect promise usage that causes infinite blocking calls
#20899 commented on
Jun 16, 2025 • 0 new comments -
We encountered the cast exception after we got result from ray actor task
#20369 commented on
Jun 16, 2025 • 0 new comments -
[Train] Refactor `TrainingIterator` result processing logic
#20330 commented on
Jun 16, 2025 • 0 new comments -
[tsan] Add TSAN CI build that runs basic Python tests
#20080 commented on
Jun 16, 2025 • 0 new comments -
[tsan] Race in census SetGlobalTags
#20079 commented on
Jun 16, 2025 • 0 new comments -
[tsan] Race accessing global stats objects
#20078 commented on
Jun 16, 2025 • 0 new comments -
[tsan] Several global config variables accessed unsafely
#20077 commented on
Jun 16, 2025 • 0 new comments -
Support working_dir=None for skipping packaging upload/download
#19962 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Placement group removal refinement
#19937 commented on
Jun 16, 2025 • 0 new comments -
[Feature] Able to access objects put in cross language
#19873 commented on
Jun 16, 2025 • 0 new comments -
[Bug] Improve RuntimeEnvSetupError message
#19824 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Deprecate Internally Maintained Probability Distributions In Favor Of Native TFP And torch.distributions Solutions
#19725 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Test KVStore early in constructor init.
#19714 commented on
Jun 16, 2025 • 0 new comments -
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-0.01 GB) is less than 0% of total. You can adjust these settings with ray.init(memory=<bytes>, object store memory=<bytes>
#12561 commented on
Jun 16, 2025 • 0 new comments -
Ray grinds to a halt if both PyTorch and TensorFlow are installed
#12467 commented on
Jun 16, 2025 • 0 new comments -
Ray does not handle MIG devices
#12413 commented on
Jun 16, 2025 • 0 new comments -
[tune] progress reporter should limit table to 80char
#12374 commented on
Jun 16, 2025 • 0 new comments -
[serve] Distributed Tracing Support in Serve
#12320 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Replace ray timeline with distributed tracing
#12315 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Support filtering logs streamed to driver by actor/task
#12305 commented on
Jun 16, 2025 • 0 new comments -
[serve] Support more expressive policies for choosing replicas
#12296 commented on
Jun 16, 2025 • 0 new comments -
[Tune] [PBT] Automatic experiment restart for synch=True
#12122 commented on
Jun 16, 2025 • 0 new comments -
[tune] [wandb] Experiment checkpointing fails with `WandbTrainableMixin`
#11917 commented on
Jun 16, 2025 • 0 new comments -
[tune] quniform distribution
#11879 commented on
Jun 16, 2025 • 0 new comments -
[docs] improve tune distributed tuning guide
#11681 commented on
Jun 16, 2025 • 0 new comments -
[tune] doc should indicate print output
#11679 commented on
Jun 16, 2025 • 0 new comments -
[cli] attach `--tmux` should show parallel command output
#11678 commented on
Jun 16, 2025 • 0 new comments -
[tune] Client API improvements
#11676 commented on
Jun 16, 2025 • 0 new comments -
[cloudpickle] Too much override for cloudpickle, breaks scikit-learn usage
#11547 commented on
Jun 16, 2025 • 0 new comments -
[docs] search results don't link to correct tab
#11288 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] Prioritize infeasible bundles and placement group rescheduling
#11259 commented on
Jun 16, 2025 • 0 new comments -
Remove the `remove_after_get` flag
#10977 commented on
Jun 16, 2025 • 0 new comments -
[placement groups] Feasibility Check
#10913 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Add unit tests for sdk.py
#10903 commented on
Jun 16, 2025 • 0 new comments -
Cannot call remote instance method of a superclass from within a different instance method of the superclass
#10899 commented on
Jun 16, 2025 • 0 new comments -
Add testing to `commands.py`/`NodeUpdaterThread` level
#10846 commented on
Jun 16, 2025 • 0 new comments -
Treat CPUs as abstract resources
#10818 commented on
Jun 16, 2025 • 0 new comments -
Installing ray on powerpc
#10774 commented on
Jun 16, 2025 • 0 new comments -
[core] Memory leak when using local simulated cluster (long_running_tests/workloads/apex.py)
#15305 commented on
Jun 16, 2025 • 0 new comments -
[Core] Bad traceback on failure to reconnect to GCS server.
#15235 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Custom sum metrics have type comment "gauge"
#15150 commented on
Jun 16, 2025 • 0 new comments -
[core] Actor restart does not work when owner dies and constructor task has dependencies
#15076 commented on
Jun 16, 2025 • 0 new comments -
[k8s] ray down command does not remove pods which are in evicted state
#14958 commented on
Jun 16, 2025 • 0 new comments -
[Tune] [Ray Client] tune_cifar10_gluon example fails with Ray Client
#14946 commented on
Jun 16, 2025 • 0 new comments -
[ray white paper] broken links
#14897 commented on
Jun 16, 2025 • 0 new comments -
Fix Asyncio Event Metrics on Java
#14715 commented on
Jun 16, 2025 • 0 new comments -
Add ray.__wheel__ with a link to the wheel to install the same version
#14623 commented on
Jun 16, 2025 • 0 new comments -
Pre-push hooks allow code to be pushed that fails LINT
#14367 commented on
Jun 16, 2025 • 0 new comments -
__del__ magic method can't access class properties
#14285 commented on
Jun 16, 2025 • 0 new comments -
Failed to load actor due to dependencies not being pickled
#14284 commented on
Jun 16, 2025 • 0 new comments -
optimization: Client blocks on releasing references due to detached actor race condition
#14137 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] request resources doesn't work with multiple jobs
#13534 commented on
Jun 16, 2025 • 0 new comments -
[Metrics] Custom metrics don't work after calling `ray.shutdown()` followed by `ray.init()`
#13532 commented on
Jun 16, 2025 • 0 new comments -
Unify linting of clang-format and *.proto files
#13465 commented on
Jun 16, 2025 • 0 new comments -
Hang or Deadlock when calling ray.get() inside pytorch Dataset when DataLoader with num_workers >0
#13407 commented on
Jun 16, 2025 • 0 new comments -
[core] Unwanted pickling behaviour when starting remote actor with @propery
#13365 commented on
Jun 16, 2025 • 0 new comments -
Explore Protos as the Ray Client pickle transport (instead of namedtuples)
#13280 commented on
Jun 16, 2025 • 0 new comments -
SIGKILL generates core dumps on some systems
#13221 commented on
Jun 16, 2025 • 0 new comments -
Object store thrashing if it runs ray.get in a non-main thread.
#12906 commented on
Jun 16, 2025 • 0 new comments -
Canonicalize the python lint options
#12801 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] refactor duplicate code for handling request_resources().
#12699 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard]Profile Actor Button Not Working
#12668 commented on
Jun 16, 2025 • 0 new comments -
[core] bytearray is parsed as bytes in remote function
#12648 commented on
Jun 16, 2025 • 0 new comments -
[Ray Dashboard] SearchSelect bug makes search have no All options
#45680 commented on
Jun 6, 2025 • 0 new comments -
RLlib: beta1 as a Tensor is not supported for capturable=False and foreach=True
#51560 commented on
Jun 6, 2025 • 0 new comments -
[Autoscaler | Serve] No autoscaler warning when trying to run a tasks when no resources are available.
#36896 commented on
Jun 6, 2025 • 0 new comments -
Custom policy
#51155 commented on
Jun 6, 2025 • 0 new comments -
[serve] Add documentation page about concurrency model & best practices
#48902 commented on
Jun 6, 2025 • 0 new comments -
[RLlib] Running RLlib example using Actor causes worker to die unexpectedly
#47820 commented on
Jun 6, 2025 • 0 new comments -
[Serve] refactor serve code that sets `docs_path`
#53023 commented on
Jun 6, 2025 • 0 new comments -
[RAY TRAIN] Force use of gloo in Windows
#49778 commented on
Jun 6, 2025 • 0 new comments -
[RLlib][Debugging] Inconsistency: The example deterministic_training.py does not provide the determinism that deterministic.py / torch_utils.py provide
#50115 commented on
Jun 6, 2025 • 0 new comments -
[Data] Evaluate support for Arrow's native Tensor types
#51965 commented on
Jun 6, 2025 • 0 new comments -
[Data] Implement proper limit pushdown
#51966 commented on
Jun 6, 2025 • 0 new comments -
[data][bug] Dataset execution can be implicitly triggered when passing a dataset to an Actor.
#52549 commented on
Jun 6, 2025 • 0 new comments -
[LLM/Data] lazy import for transformers
#52632 commented on
Jun 6, 2025 • 0 new comments -
[Data] RayData driver process crashes when some worker(pod) been preempted
#52815 commented on
Jun 6, 2025 • 0 new comments -
[Data] Support partial_sql API in Ray Data - Integration with DuckDB/Polars
#52390 commented on
Jun 6, 2025 • 0 new comments -
[Core] Make sure Actor's `__del__` method invoked on Actor's destruction
#53169 commented on
Jun 6, 2025 • 0 new comments -
[RLlib] PPO algorithm can't be trained from checkpoint
#50136 commented on
Jun 6, 2025 • 0 new comments -
[Ray Core] Ray scheduler is abnormally slow
#53077 commented on
Jun 6, 2025 • 0 new comments -
[DATA] Ray Data Autoscaling Ignores Custom Resources
#49589 commented on
Jun 5, 2025 • 0 new comments -
[data] ray write lance error
#49211 commented on
Jun 5, 2025 • 0 new comments -
[<Ray component: data] `ray.data.read_text` raise `numpy.core._exceptions._ArrayMemoryError: Unable to allocate`
#46293 commented on
Jun 5, 2025 • 0 new comments -
[data] Ray Data adds >100ms delay before producing the first batch of a Dataset
#42376 commented on
Jun 5, 2025 • 0 new comments -
[data] Add memory usage of streaming_split clients to resource accounting for backpressure
#39595 commented on
Jun 5, 2025 • 0 new comments -
[data] Optimize actor pool autoscaling policy
#41956 commented on
Jun 5, 2025 • 0 new comments -
[DASHBOARD] Dashboard cannot detect live workers
#33326 commented on
Jun 6, 2025 • 0 new comments -
[State API/Dashboard] UX issues with Ray list actors
#33484 commented on
Jun 6, 2025 • 0 new comments -
[autoscaler][observability][dashboard] Wrong metrics are being used for autoscaler graphs in dashboard
#33550 commented on
Jun 6, 2025 • 0 new comments -
[Ray Dashboard] Node link in the event doesn't work
#33977 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Metrics in the serve app UI shouldn't appear when there is no serve app
#34013 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Cannot copy all the logs (logs that expand beyond 1 page) of a file in log UI
#34180 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard][Observability] Dashboard shows that a task is still running after a RayCluster with GCS FT restarts
#34507 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Hide actions for dead entities and small polish items
#34511 commented on
Jun 6, 2025 • 0 new comments -
[Ray State Observability] unable to connect to GCS when using ray logs
#35103 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Logs view is broken after head node is restarted by GCS FT
#35131 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Ray Dashboard not showing tasks view
#35132 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Ray dashboard struggles to load when the job history accumulates
#35202 commented on
Jun 6, 2025 • 0 new comments -
[State api/dashboard] Improvements to the GC policy of task backend and dashboard UI (better explanations about it)
#35723 commented on
Jun 6, 2025 • 0 new comments -
[Core] Sporadic GCS timeout issue
#35870 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Event table has weird scrollbar when the event message is long.
#36828 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Ray serve page returns 503
#36889 commented on
Jun 6, 2025 • 0 new comments -
Ray Dashboard: Prometheus Health Check
#38425 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Print node status from `ray status -v` and card shouldn't grow in height
#38520 commented on
Jun 6, 2025 • 0 new comments -
[metrics] Actor metrics are split between method calls, making it hard to see totals
#38831 commented on
Jun 6, 2025 • 0 new comments -
[Ray Dashboard] Placement group task name is a bit confusing.
#39662 commented on
Jun 6, 2025 • 0 new comments -
[Core] After the task ends, the memory of IDLE and spill worker does not release
#34613 commented on
Jun 6, 2025 • 0 new comments -
[Core] Dashboard metrics don't match with each other
#34957 commented on
Jun 6, 2025 • 0 new comments -
[Ray dashboard] Actors tab does not list actors under certain conditions
#47447 commented on
Jun 6, 2025 • 0 new comments -
[Observability] Ray Dashboard and Metrics aren't listing Driver by default
#50097 commented on
Jun 6, 2025 • 0 new comments -
Dashboard profiling timeline is missing many events
#42496 commented on
Jun 6, 2025 • 0 new comments -
[Core][API] get error from ObjectRef without the overhead of fetching the actual data
#32817 commented on
Jun 4, 2025 • 0 new comments -
[Rllib] new API stack crashes when using Repeated space
#52093 commented on
Jun 4, 2025 • 0 new comments -
[core] Generate *.pyi stubs for protobufs
#52482 commented on
Jun 4, 2025 • 0 new comments -
[Serve] DeepSeek-R1 mode load stuck in H20
#50975 commented on
Jun 3, 2025 • 0 new comments -
[Core] Ray schedules 2 actors requesting 1.25 GPUs total on the same single GPU
#52915 commented on
Jun 3, 2025 • 0 new comments -
[Core] Redis health check lacks timeout detection.
#52933 commented on
Jun 3, 2025 • 0 new comments -
[core] ActorHandle remote() return type hint should be ObjectRef not Unknown
#52772 commented on
Jun 3, 2025 • 0 new comments -
[core] actor constructor type hint should be ActorHandle
#52771 commented on
Jun 3, 2025 • 0 new comments -
[core] Separate Ray Usage Status collection from submission
#53362 commented on
Jun 3, 2025 • 0 new comments -
[Ray Dashboard] Log output cannot be viewed after the node is scaled down
#46837 commented on
Jun 3, 2025 • 0 new comments -
[RLLIB] PPO calls ValueFunctionAPI with batch size
#52432 commented on
Jun 3, 2025 • 0 new comments -
[RFC] [Serve] Custom Request Router
#53016 commented on
Jun 3, 2025 • 0 new comments -
[Serve] reason_content is null returned by llm serve
#53324 commented on
Jun 3, 2025 • 0 new comments -
[Data] Ray keeps adding nodes beyond Dataset.map concurrency
#52573 commented on
Jun 2, 2025 • 0 new comments -
[Data] ray.data.from_huggingface still fails on multi-node clusters – ModuleNotFoundError: datasets_modules
#52708 commented on
Jun 2, 2025 • 0 new comments -
[Core] Python 3.13 wheel
#49738 commented on
Jun 2, 2025 • 0 new comments -
[Core] Support arbitrarily s3 endpoint in working dir when downloading package in runtime_env
#30601 commented on
Jun 1, 2025 • 0 new comments -
[CORE] actor atexit is not called when SIGTERM receive
#50004 commented on
Jun 1, 2025 • 0 new comments -
Two IPs for Ray worker nodes: one for in-cluster communication and another for communication within the node itself?
#51402 commented on
May 31, 2025 • 0 new comments -
[RLlib] bug: env_to_module pipeline is run twice (on done) when "early-out"
#53053 commented on
May 30, 2025 • 0 new comments -
[Serve.llm] vLLMDeployment throughput doesn't scale well with `n_replicas`.
#53356 commented on
May 30, 2025 • 0 new comments -
Usability: The Queue is slow & something like ZMQ is almost always preferable
#53010 commented on
May 29, 2025 • 0 new comments -
[data] Zero-sized blocks crashes write_bigquery
#51892 commented on
May 29, 2025 • 0 new comments -
[Core]Separate Environment Variables for ray.init() and ray ctl to Reflect Different Protocols and Ports
#53226 commented on
May 29, 2025 • 0 new comments -
Ray core: `AttributeError: 'Worker' object has no attribute 'core_worker'`
#47759 commented on
May 29, 2025 • 0 new comments -
[Data, Train] ray::SplitCoordinator is very slow at every epoch + takes up too much memory
#49190 commented on
Jun 5, 2025 • 0 new comments -
ray.data.Dataset.random_sample does not return a new dataset
#53234 commented on
Jun 5, 2025 • 0 new comments -
[Data] Dictionary changed size during iteration runtime error with streaming split and iter_batches
#53268 commented on
Jun 5, 2025 • 0 new comments -
[Data] Stratification in train_test_split
#53297 commented on
Jun 5, 2025 • 0 new comments -
[Data] Stratified sampling with groupby.sample
#53296 commented on
Jun 5, 2025 • 0 new comments -
[Data] Improve error handling in `_align_struct_fields`
#52939 commented on
Jun 5, 2025 • 0 new comments -
[Doc] [Data] broken page for repartition()
#52732 commented on
Jun 5, 2025 • 0 new comments -
[Core] ray distributed debugger, always connecting to cluster..
#50682 commented on
Jun 5, 2025 • 0 new comments -
[core] Ray should properly handle actor process shutdown failures
#52322 commented on
Jun 4, 2025 • 0 new comments -
[core][API] Provide an API to wait for the cluster to reach specific conditions
#51860 commented on
Jun 4, 2025 • 0 new comments -
[Core] Find if an ObjectRef failed without an expensive ray.get() call
#52189 commented on
Jun 4, 2025 • 0 new comments -
[Job] Ray job log streaming misses to report the last log line
#46413 commented on
Jun 4, 2025 • 0 new comments -
[core] PyGILState_Release: thread state must be current when releasing
#53341 commented on
Jun 4, 2025 • 0 new comments -
[Ray Train] FileNotFoundError '/tmp/ray/sessio_xxxx/xxxx/.tmp_generator'
#51020 commented on
Jun 4, 2025 • 0 new comments -
[RayTrain] ScalingConfig resources_per_worker input validation/error handling
#49372 commented on
Jun 4, 2025 • 0 new comments -
Type annotation of `datasets` is unnecessarily invariant, but could be covariant
#48986 commented on
Jun 4, 2025 • 0 new comments -
[RayTrain] Manual checkpoint persistence to storage
#52762 commented on
Jun 4, 2025 • 0 new comments -
[Train] Add support for NeMo Megatron strategy with lightning
#51387 commented on
Jun 4, 2025 • 0 new comments -
[Tune, Train] Ray Tune DDP Hyperparameter Search - EvalPrediction Logging Issues with Multiple Eval Datasets
#51788 commented on
Jun 4, 2025 • 0 new comments -
[Core,Trainer] Actor and Trainer cannot work together for AMD-Instinct-MI250X-MI250
#51985 commented on
Jun 4, 2025 • 0 new comments -
[Ray debugger] - Debugger doesn't work when running ray.train.TorchTrainer
#53022 commented on
Jun 4, 2025 • 0 new comments -
[runtime_env] Reference counter doesn't handle multiple options using the same URI
#52578 commented on
Jun 4, 2025 • 0 new comments -
[Core]Ray head crashed silently - improve observability for redis timeouts causing said crash
#47419 commented on
Jun 4, 2025 • 0 new comments -
[Ray Core] ray.wait with num_returns=1 is pretty slow
#49905 commented on
Jun 4, 2025 • 0 new comments -
[core][experimental] Throw error if DAG actor task would hang due to shared memory outputs still in scope
#46055 commented on
Jun 4, 2025 • 0 new comments -
[Serve] `serve.run` can bind the incorrect Application if Deployments have the same name
#53295 commented on
Jun 16, 2025 • 0 new comments -
[Serve] RayServe Pods Stuck in Unready State Causing API Outages
#53323 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Support generics for DeploymentHandle type hints
#52654 commented on
Jun 16, 2025 • 0 new comments -
[Ray Complied Graph] NCCL Internal Error
#49827 commented on
Jun 16, 2025 • 0 new comments -
[Data] Get Dataset size from DataIterator
#37634 commented on
Jun 15, 2025 • 0 new comments -
[Core|Dataset] Ray job stuck with idle actors with no tasks
#45822 commented on
Jun 13, 2025 • 0 new comments -
Global Per-Epoch Shuffling with TorchTrainer
#47460 commented on
Jun 13, 2025 • 0 new comments -
[LLM] We need to create a more robust way of handling actor shutdown
#53179 commented on
Jun 13, 2025 • 0 new comments -
[Core] Prevent schedulling non-GPU tasks to GPU nodes
#47866 commented on
Jun 12, 2025 • 0 new comments -
[Core | Serve] Compatibility issue with pydantic>=2.10
#52211 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] Env runners error out when interacting with Repeated observation spaces
#53327 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] Type of `AlgorithmConfig.training(learner_connector` is wrong
#53368 commented on
Jun 12, 2025 • 0 new comments -
[RLlib] gym.spaces.Sequence unbatching error
#53293 commented on
Jun 12, 2025 • 0 new comments -
[Serve] Proxy actor not started on worker node when using kuberay
#50349 commented on
Jun 12, 2025 • 0 new comments -
[Core] Turn off RayTaskError cause wrapping functionality
#48320 commented on
Jun 11, 2025 • 0 new comments -
[Core] `ray job submit` doesn't always catch the last lines of the job logs
#48701 commented on
Jun 11, 2025 • 0 new comments -
[Core] Ray dashboard agent high memory usage
#52639 commented on
Jun 11, 2025 • 0 new comments -
[Data] [optimizer] map/map_batches should output the same number of rows as the input
#36295 commented on
Jun 11, 2025 • 0 new comments -
[llm] Roadmap for Data and Serve LLM APIs
#51313 commented on
Jun 11, 2025 • 0 new comments -
[Train] Support for `lightning.pytorch` on the `mps` backend
#49858 commented on
Jun 10, 2025 • 0 new comments -
[Bug] Dashboard can't start with TLS on
#22466 commented on
Jun 10, 2025 • 0 new comments -
Support Availability Zone Deployment in Azure
#39966 commented on
Jun 10, 2025 • 0 new comments -
ray azure does not work out of the box
#52511 commented on
Jun 10, 2025 • 0 new comments -
Ray serve + core steaming is slow at high concurrency
#52745 commented on
Jun 10, 2025 • 0 new comments -
[ray.serve.llm] serve.llm with streaming has overhead compared to vllm-v0 for a single replica when concurrency > 32
#52746 commented on
Jun 10, 2025 • 0 new comments -
Support of Ray Decorator for Built in Functions
#6308 commented on
Jun 16, 2025 • 0 new comments -
[docs] Issue on `tune-schedulers.rst`
#6063 commented on
Jun 16, 2025 • 0 new comments -
Can I set priority for my tasks
#6057 commented on
Jun 16, 2025 • 0 new comments -
Avoid putting the redis password in plain text in processlist
#5872 commented on
Jun 16, 2025 • 0 new comments -
Handling `use_pickle=True` with pickle5 serializer and performance regression
#5856 commented on
Jun 16, 2025 • 0 new comments -
Install ray with conda but not pip
#5511 commented on
Jun 16, 2025 • 0 new comments -
[tune] saving mechanism and PBT
#5312 commented on
Jun 16, 2025 • 0 new comments -
Feature request: An API to wait until there are are X resources available
#5243 commented on
Jun 16, 2025 • 0 new comments -
[Feature request] Also expose python function after decorating with ray.remote
#4981 commented on
Jun 16, 2025 • 0 new comments -
Creative action space support: contains method, action interpoalation.
#4837 commented on
Jun 16, 2025 • 0 new comments -
__module__ can be None
#4758 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Autoscaler UX Issues
#4656 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Add tests that mock endpoints for AWS, GCE
#4303 commented on
Jun 16, 10000 2025 • 0 new comments -
Python Worker class should have proper constructor and destructor.
#3961 commented on
Jun 16, 2025 • 0 new comments -
Should not ignore "AttributeError"
#3820 commented on
Jun 16, 2025 • 0 new comments -
Backend timing statements should be made type safe.
#3341 commented on
Jun 16, 2025 • 0 new comments -
Make it possible to limit memory usage of processes
#3055 commented on
Jun 16, 2025 • 0 new comments -
Task submission from local scheduler client is blocking
#2940 commented on
Jun 16, 2025 • 0 new comments -
Add test for numpy array alignment.
#2937 commented on
Jun 16, 2025 • 0 new comments -
Allow ray.get and ray.wait to take in additional argument types
#2126 commented on
Jun 16, 2025 • 0 new comments -
Remove the import thread from the workers and driver.
#951 commented on
Jun 16, 2025 • 0 new comments -
Remote decorator fails on jitted function.
#593 commented on
Jun 16, 2025 • 0 new comments -
Actors do not work properly with subclasses that call super.
#449 commented on
Jun 16, 2025 • 0 new comments -
Methods on actors inherited from built-in classes are not visible
#278 commented on
Jun 16, 2025 • 0 new comments -
[Core][StreamingGenerator] `ray.get` will hang when the node on which the streaming task is running fails.
#47582 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] Support row in the dashboard.
#38024 commented on
Jun 9, 2025 • 0 new comments -
[Event] Support rotation for the event log files
#39591 commented on
Jun 9, 2025 • 0 new comments -
Troubleshooting the root cause that cluster_status is undefined
#40076 commented on
Jun 9, 2025 • 0 new comments -
[core][state] list objects show objects if spilled.
#31374 commented on
Jun 9, 2025 • 0 new comments -
[Core] Core metrics observed from worker nodes do not propagate to Prometheus
#31675 commented on
Jun 9, 2025 • 0 new comments -
[State API] ray log truncation message improvements
#32392 commented on
Jun 9, 2025 • 0 new comments -
[Core] Expose logs for runtime environment installation process on worker nodes for remote Ray clusters
#34310 commented on
Jun 9, 2025 • 0 new comments -
[Ray Dashboard] Set Route Prefix/Base Address
#35269 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Add job retention mechanism
#35700 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Should specify the time range in job detail page for load the cluster status and scale metrics
#41781 commented on
Jun 9, 2025 • 0 new comments -
[Ray debugger] Unable to use debugger on Ray Cluster on k8s
#45541 commented on
Jun 9, 2025 • 0 new comments -
[Core] Ray_tasks and ray_memory_manager_worker_eviction_total metrics should emit 0 instead of null for each state at start
#47616 commented on
Jun 9, 2025 • 0 new comments -
[Serve] Multiple FastAPI ingress deployments in a single application are not disallowed
#53024 commented on
Jun 8, 2025 • 0 new comments -
[Train] Allow customization of FPS for wandb logger; instead of slow 4 FPS
#50186 commented on
Jun 7, 2025 • 0 new comments -
[Core] Worker exit is not reported if the worker is dead by node exit
#24957 commented on
Jun 6, 2025 • 0 new comments -
[Core] Certain log files cannot be followed/streamed
#29928 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Flame graph and stack trace links should be removed after the actors/workers are dead.
#31116 commented on
Jun 6, 2025 • 0 new comments -
[core][state][log] ray log should traverse old directory for worker logs
#31126 commented on
Jun 6, 2025 • 0 new comments -
core - ray logs CLI doesn't work for kubernetes raycluster
#31381 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] Warning events' severity levels are "Info"
#32012 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] what the memory column refers to in the node table is not clear
#32073 commented on
Jun 6, 2025 • 0 new comments -
[Job] The "Succeeded" job state is confusing when it is simply "Completed"
#32076 commented on
Jun 6, 2025 • 0 new comments -
[core][job] Change job related timestamp to use nanoseconds
#32163 commented on
Jun 6, 2025 • 0 new comments -
Failed to start the dashboard, return code 1
#32312 commented on
Jun 6, 2025 • 0 new comments -
[Dashboard] UI defects in object store memory column of node table
#32512 commented on
Jun 6, 2025 • 0 new comments -
Uv sync with project using Ray fails installing on Python 3.13
#52819 commented on
Jun 9, 2025 • 0 new comments -
[Data] Significant Memory Leak / OOM When Reading Large Parquet Files with RayData
#49158 commented on
Jun 9, 2025 • 0 new comments -
[container] Publish multi-architecture container images
#41727 commented on
Jun 9, 2025 • 0 new comments -
[distributed debugger] vscode extension does not accept windows path when configuring cluster
#53088 commented on
Jun 9, 2025 • 0 new comments -
[Data] Adding streaming capability for `ray.data.Dataset.unique`
#51207 commented on
Jun 9, 2025 • 0 new comments -
[Core] Identify Mac M1/M2 GPUs as valid GPUs
#39136 commented on
Jun 9, 2025 • 0 new comments -
Ray dashboard_url and prom_discovery.json files not scoped to session dir
#12662 commented on
Jun 9, 2025 • 0 new comments -
[Logs] Spdlog doesn't rotate raylet.out and gcs_server.out
#13466 commented on
Jun 9, 2025 • 0 new comments -
[core] Getting node IP address by object ref
#13630 commented on
Jun 9, 2025 • 0 new comments -
[metrics] Report general metrics for gRPC
#14368 commented on
Jun 9, 2025 • 0 new comments -
[metrics] ray.util.metrics API should closely mirror the prometheus python API
#14496 commented on
Jun 9, 2025 • 0 new comments -
[dashboard] Wonky GPU display
#14664 commented on
Jun 9, 2025 • 0 new comments -
[jobs] [Feature] Support streaming job logs to stdout/stderr
#23564 commented on
Jun 9, 2025 • 0 new comments -
[Core Observability] Include name to actor log prefix + process name
#24876 commented on
Jun 9, 2025 • 0 new comments -
[Jobs] Setting `RAY_LOG_TO_STDERR` results in empty job logs
#24886 commented on
Jun 9, 2025 • 0 new comments -
[core] Node IDs not consistent across APIs
#25090 commented on
Jun 9, 2025 • 0 new comments -
[Autoscaler/Core][Code quality] Handle autoscaler event logging through RPC, not logs.
#26186 commented on
Jun 9, 2025 • 0 new comments -
[autoscaler][logs] Improve status logging
#26670 commented on
Jun 9, 2025 • 0 new comments -
[State Observability] Improve ray list job implementation.
#26832 commented on
Jun 9, 2025 • 0 new comments -
[Core] timeline doesn't show all infos.
#28320 commented on
Jun 9, 2025 • 0 new comments -
[Core] Setting python log level for ray processes
#29758 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] [CI] Add tests for uncaught exceptions
#29809 commented on
Jun 9, 2025 • 0 new comments -
[Dashboard] Kill a job
#30182 commented on
Jun 9, 2025 • 0 new comments -
[UI] [Log viewer] Prevent the infinite loading for logs fetch
#36486 commented on
Jun 9, 2025 • 0 new comments -
[dashboard][UI] [TaskTable] Rendering task table has a 8s delay
#36656 commented on
Jun 9, 2025 • 0 new comments -
[Core] WorkerThreadContext semantics are incorrect for async Python actors.
#10324 commented on
Jun 16, 2025 • 0 new comments -
[ray] Programatically expose the amount of memory available in the object store
#10278 commented on
Jun 16, 2025 • 0 new comments -
[tune] Improve the serialization diagnoser by providing deeper introspection
#10263 commented on
Jun 16, 2025 • 0 new comments -
[tune] Usability issues
#10248 commented on
Jun 16, 2025 • 0 new comments -
[ray] Support mypy
#10244 commented on
Jun 16, 2025 • 0 new comments -
Removed the following hyperparameter values when logging to tensorboard: ... [tune]
#10166 commented on
Jun 16, 2025 • 0 new comments -
[dask-on-ray] ValueError on read-only memory
#10124 commented on
Jun 16, 2025 • 0 new comments -
[cli/docs] Provide example commands in the CLI docstrings.
#10079 commented on
Jun 16, 2025 • 0 new comments -
[Placement Group] Placement group dashboard
#9775 commented on
Jun 16, 2025 • 0 new comments -
Ray issue with serializing pytorch objects only when running on 40+ cores
#9752 commented on
Jun 16, 2025 • 0 new comments -
Ray typing IDE code completion support
#9623 commented on
Jun 16, 2025 • 0 new comments -
[Cluster][Task Schedule] Remote function is not executing without any errors
#9598 commented on
Jun 16, 2025 • 0 new comments -
[core] RayConfig does not get set properly after multiple `ray.init` calls
#9545 commented on
Jun 16, 2025 • 0 new comments -
[New scheduler] Performance optimization
#9487 commented on
Jun 16, 2025 • 0 new comments -
[New scheduler] Release testing
#9486 commented on
Jun 16, 2025 • 0 new comments -
Specify network interface to use / RuntimeError: Redis has started but no raylets have registered yet.
#9456 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Autoscaler prints (harmless) errors every 30 mins and then kills the workers in GCP cluster
#9368 commented on
Jun 16, 2025 • 0 new comments -
[Core] Core Worker Actor Handle GC.
#9342 commented on
Jun 16, 2025 • 0 new comments -
Graph related applications
#9324 commented on
Jun 16, 2025 • 0 new comments -
Options Support for Actor Methods
#9296 commented on
Jun 16, 2025 • 0 new comments -
[ray] constant memory usage increase of actor using actor handle.
#9232 commented on
Jun 16, 2025 • 0 new comments -
Invalid memory access in RedisAsioClient/RedisAsyncContext on shutdown
#9074 commented on
Jun 16, 2025 • 0 new comments -
Performance issue with many large tasks on 10 node cluster.
#8950 commented on
Jun 16, 2025 • 0 new comments -
Ray Dashboard Head-node CLI [autoscaler]
#8450 commented on
Jun 16, 2025 • 0 new comments -
[tune] tutorial should indicate specific library version that we've tested against.
#11540 commented on
Jun 16, 2025 • 0 new comments -
[Core] Raylet can schedule tasks from a dead driver.
#11520 commented on
Jun 16, 2025 • 0 new comments -
`ray stop` should not kill all redis-server processes
#11513 commented on
Jun 16, 2025 • 0 new comments -
[core] Track the number of connection and use shared pool whenever possible for grpc clients.
#11445 commented on
Jun 16, 2025 • 0 new comments -
ray commandline tools raise exceptions if you forget the YAML config file
#11396 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] Placement group rescheduling over-allocates resources
#11372 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] request_resources with partial instance availability leads to workers never shutting down
#11367 commented on
Jun 16, 2025 • 0 new comments -
how to add two-timescales Learning rate schedule in coustom policy?
#11328 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] Add additional gpu types to util.accelerators
#11160 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Worker node container is not removed after ray down?
#11098 commented on
Jun 16, 2025 • 0 new comments -
`ray stop` should wait for processes to exit
#10955 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] node type preferences
#10929 commented on
Jun 16, 2025 • 0 new comments -
Private/onprem clusters always need explicit ssh_private_key in docker
#10838 commented on
Jun 16, 2025 • 0 new comments -
[docs] Add examples for using custom resources
#10808 commented on
Jun 16, 2025 • 0 new comments -
Autoscaler should set RAY_ADDRESS environment variable
#10752 commented on
Jun 16, 2025 • 0 new comments -
Stop using `file_mounts` for ray_bootstrap_config & ray_bootstrap_key
#10743 commented on
Jun 16, 2025 • 0 new comments -
[Java] Remove Java 9/10/11 warnings
#10673 commented on
Jun 16, 2025 • 0 new comments -
[Documentation] need for default_resource_requests when using custom train function
#10572 commented on
Jun 16, 2025 • 0 new comments -
[rllib] action from policy with Tuple action space has wrong shape
#10516 commented on
Jun 16, 2025 • 0 new comments -
[tune] String summarization/representations for user objects
#10489 commented on
Jun 16, 2025 • 0 new comments -
[tune] Add regression test for avoiding extraneous output
#10485 commented on
Jun 16, 2025 • 0 new comments -
[GCS]Remove tightly coupled Redis code path from Python
#10359 commented on
Jun 16, 2025 • 0 new comments -
[GCS]Support Sharding GCS server
#10358 commented on
Jun 16, 2025 • 0 new comments -
[GCS]Support Multi-threaded GCS server.
#10357 commented on
Jun 16, 2025 • 0 new comments -
[GCS]Support different backend for GCS instead of Redis
#10356 commented on
Jun 16, 2025 • 0 new comments -
Task introspection
#2617 commented on
Jun 16, 2025 • 0 new comments -
ray start does not restart failed processes
#2587 commented on
Jun 16, 2025 • 0 new comments -
CI test linux://rllib:learning_tests_cartpole_dqn_gpu is flaky
#46683 commented on
Jun 16, 2025 • 0 new comments -
[rllib] flattening error in gym.spaces.Sequence
#45563 commented on
Jun 16, 2025 • 0 new comments -
cannot import name 'EPISODE_RETURN_MEAN' from 'ray.rllib.utils.metrics'
#45453 commented on
Jun 16, 2025 • 0 new comments -
error: No such option: --torch
#45452 commented on
Jun 16, 2025 • 0 new comments -
[Core] Unable to run worker with virtual environment without installing dashboard
#45410 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] How to support gymnasium graph obs space?
#45290 commented on
Jun 16, 2025 • 0 new comments -
Ray Cluster does not work across multiple docker containers
#45252 commented on
Jun 16, 2025 • 0 new comments -
[Core] Worker crashes unexpectedly due to frequent triggering of OOM
#45244 commented on
Jun 16, 2025 • 0 new comments -
Ray Cluster: Failed to create a ray cluster using running container
#45148 commented on
Jun 16, 2025 • 0 new comments -
[Rllib] Rllib provides wrong state batch size during "bug check" batches on torch custom model
#45131 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] ValueError in initialization of ImpalaTF2Policy
#45050 commented on
Jun 16, 2025 • 0 new comments -
[core] GcsSubscriber hangs in shutdown if the connection broke on MacOS
#45044 commented on
Jun 16, 2025 • 0 new comments -
Workflow: Reading workflow status can lead to corrupted json reads.
#45027 commented on
Jun 16, 2025 • 0 new comments -
[Core] `ray.wait` not actually wait until ready when the task is longer than 12 days
#44909 commented on
Jun 16, 2025 • 0 new comments -
[Data] Add `delete_dir_contents` parameter to `FileDatasink`
#44794 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] PPO and framework=tf / issue with latest tensorflow 2.16.1
#44675 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] PPO reset_config() AttributeError: 'dict' object has no attribute '_enable_new_api_stack'
#44506 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] ReplayBuffer doesnt work with zero_init_states False when store rnn sequence
#44383 commented on
Jun 16, 2025 • 0 new comments -
[Cluster, YARN with Skein] Ray cluster keeps crashing when running on YARN via Skein
#44112 commented on
Jun 16, 2025 • 0 new comments -
[Ray Core] Ray nightly GPU docker image broken on NVIDIA V100 GPUs on AWS
#43565 commented on
Jun 16, 2025 • 0 new comments -
Using RNN for RL
#43420 commented on
Jun 16, 2025 • 0 new comments -
from ray.rllib.agents.registry import get_trainer_class ModuleNotFoundError: No module named 'ray.rllib.agents'
#43310 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] AWS Single Sign-On support
#30064 commented on
Jun 16, 2025 • 0 new comments -
Support TPUs across all of Ray
#8260 commented on
Jun 16, 2025 • 0 new comments -
[core] ray.init does not work if run in a node with external ip while the cluster is started internally
#8244 commented on
Jun 16, 2025 • 0 new comments -
[docs][autoscaler] additional dependencies needs to be mentioned to build your own autoscaler image
#8235 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Api instead of CLI to interact with cluster.
#8036 commented on
Jun 16, 2025 • 0 new comments -
Incorrect unreconstructable error message and raise different exception.
#7804 commented on
Jun 16, 2025 • 0 new comments -
ray.wait hangs with no warning or error when local object store is too small to receive object
#7802 commented on
Jun 16, 2025 • 0 new comments -
Segmentation Fault when using multiprocessing.Queue
#7793 commented on
Jun 16, 2025 • 0 new comments -
ray.wait with local_mode=True blocks for a very long time
#7741 commented on
Jun 16, 2025 • 0 new comments -
Awesome: algorithm selection helper & diagrams
#7722 commented on
Jun 16, 2025 • 0 new comments -
Ray hangs when machine is disconnected from network
#7696 commented on
Jun 16, 2025 • 0 new comments -
[docs] Clarify that in K8s the jobs need to be launched from the workers
#7188 commented on
Jun 16, 2025 • 0 new comments -
[ray] tasks running in docker containers are not stopped on local cluster
#6898 commented on
Jun 16, 2025 • 0 new comments -
[dist] Release notes for Java And other Languages
#6608 commented on
Jun 16, 2025 • 0 new comments -
Ray.wait causes node to hang if there are too many object ids
#6403 commented on
Jun 16, 2025 • 0 new comments -
Performance issues with defining remote functions and actor classes from within tasks.
#6240 commented on
Jun 16, 2025 • 0 new comments -
TypeError: can't pickle CudnnModule objects
#5947 commented on
Jun 16, 2025 • 0 new comments -
Profiling ray tasks includes ray initialization time
#5832 commented on
Jun 16, 2025 • 0 new comments -
Make it possible to see resource deadlocks through web UI.
#5789 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Raise better error message if `ssh_user` is not correct
#5772 commented on
Jun 16, 2025 • 0 new comments -
Code coverage tracker
#5473 commented on
Jun 16, 2025 • 0 new comments -
[ray] ray misuse gpu in docker container
#5245 commented on
Jun 16, 2025 • 0 new comments -
On a background thread, `ray.wait` doesn't timeout until another method on the actor is called
#4934 commented on
Jun 16, 2025 • 0 new comments -
Ray is not propagating variable types correctly
#4463 commented on
Jun 16, 2025 • 0 new comments -
[tune] Support nesting grid_search in lambdas
#3466 commented on
Jun 16, 2025 • 0 new comments -
Retry policy when a worker crashes: a hook missing?
#2635 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Add metrics for debugging Dask-on-Ray
#14372 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Report metrics to be used for debugging load balancing issues
#14369 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Remove unused or unnecessary metrics.
#14366 commented on
Jun 16, 2025 • 0 new comments -
When the node is crashed, logs are not accessible.
#14307 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] SSH command errors aren't written to monitor.out
#14298 commented on
Jun 16, 2025 • 0 new comments -
[dashboard] Add resource usage/availability to the dashboard
#14292 commented on
Jun 16, 2025 • 0 new comments -
[Core] Fix ray::Status <--> gRPC status interplay.
#14278 commented on
Jun 16, 2025 • 0 new comments -
updating worker nodes show as healthy
#14232 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Use subdirectories to avoid large top level inodes for file spilling
#14166 commented on
Jun 16, 2025 • 0 new comments -
[tune] Stack Traces with Function API are really hard to parse
#14162 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Plasma store probably doesn't respect the max shm size.
#14145 commented on
Jun 16, 2025 • 0 new comments -
Latent bugs in command_runner.py
#14139 commented on
Jun 16, 2025 • 0 new comments -
[rllib] undocumented behavior of timers/* in progress.csv
#14052 commented on
Jun 16, 2025 • 0 new comments -
Graceful Placement Group Removal
#14045 commented on
Jun 16, 2025 • 0 new comments -
Improve Docker manual setup document
#14030 commented on
Jun 16, 2025 • 0 new comments -
[UX] Allow passing CPU and GPU to actor and task resources.
#13996 commented on
Jun 16, 2025 • 0 new comments -
Remove cluster_synced_files and file_mounts_sync_continuously
#13967 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Allow to specify max_disk_usage for file system spilling.
#13960 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] add actor detail to experimental dashboard
#13875 commented on
Jun 16, 2025 • 0 new comments -
ray.put() slows down over time.
#13612 commented on
Jun 16, 2025 • 0 new comments -
[rllib]Action masking with tuple action space
#13592 commented on
Jun 16, 2025 • 0 new comments -
[dask-on-ray] Remove internal Dask API dependencies from the Dask-on-Ray scheduler.
#13560 commented on
Jun 16, 2025 • 0 new comments -
[core] GCS doesn't always cancel worker leases for killed actors
#13545 commented on
Jun 16, 2025 • 0 new comments -
test_autoscaling_policy.py prints out huge pile of JsonErrors
#13433 commented on
Jun 16, 2025 • 0 new comments -
Remove the RAY_CLIENT_MODE flag now that we don't need it
#13279 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler][docs] Explain how the `ray_bootstrap_config` is generated
#15232 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Don't autofill `setup_commands` if head/worker `setup_commands` are used
#15231 commented on
Jun 16, 2025 • 0 new comments -
[Core] Add gRPC streaming support.
#15219 commented on
Jun 16, 2025 • 0 new comments -
Optimise for num_workers stucks in the infinite loop
#15168 commented on
Jun 16, 2025 • 0 new comments -
Ray dies without a proper error message - "Killed", might have to do with pandas
#15165 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Simplify Custom ObjectStore Size
#15147 commented on
Jun 16, 2025 • 0 new comments -
[Core] Periodical runner can cause heap-use-after-free
#15141 commented on
Jun 16, 2025 • 0 new comments -
Metric tag keys type inference (Tuple To String)
#15130 commented on
Jun 16, 2025 • 0 new comments -
Actor task hangs after actor crashes with max_task_retries=0
#15045 commented on
Jun 16, 2025 • 0 new comments -
AlphaZero torch model doesn't support cuda, only cpu
#14970 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] AWS setup commands hardcodes pip
#14963 commented on
Jun 16, 2025 • 0 new comments -
Support "dry runs" for deploy() operations
#14936 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Failing objects that fail to restore many times.
#14921 commented on
Jun 16, 2025 • 0 new comments -
num_cpus not handled correctly when function has a Queue argument
#14863 commented on
Jun 16, 2025 • 0 new comments -
Make rolling update batch size configurable
#14853 commented on
Jun 16, 2025 • 0 new comments -
Typed handle to deployments
#14810 commented on
Jun 16, 2025 • 0 new comments -
[Core] Docs - run data processing examples in CI
#14769 commented on
Jun 16, 2025 • 0 new comments -
[core] The remote function has been exported 100 times..
#14730 commented on
Jun 16, 2025 • 0 new comments -
Support `ray status CLUSTER.YAML`
#14549 commented on
Jun 16, 2025 • 0 new comments -
Support decoupling task/actor interfaces from implementation
#14529 commented on
Jun 16, 2025 • 0 new comments -
Support specifying container images in runtime_env
#14528 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] Error message not being cleared when autoscaler recovers
#14494 commented on
Jun 16, 2025 • 0 new comments -
[Docs] [tune] WanDB + Ray Integration a bit unclear from the docs
#14478 commented on
Jun 16, 2025 • 0 new comments -
[tune] TBXLoggerCallback not creating necessary directory
#14437 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler][interface] Per-node-type docker configs
#14418 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Utility to easily configure logging for a Ray job/actor/task
#12306 commented on
Jun 16, 2025 • 0 new comments -
`ray dashboard` throws bad exception
#12246 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Tune S3 performance + Add unit tests with moto3
#12232 commented on
Jun 16, 2025 • 0 new comments -
Duplicated IDs are generated
#12197 commented on
Jun 16, 2025 • 0 new comments -
[tune/logging] Warning for Tune
#12140 commented on
Jun 16, 2025 • 0 new comments -
[tune] Restarted Trials Use Incorrect Command When Multiple Commands Run on Cluster/Runtime
#12048 commented on
Jun 16, 2025 • 0 new comments -
[Object spilling] Move LocalObjectManager into the plasma store
#12042 commented on
Jun 16, 2025 • 0 new comments -
[Object spilling] Improve OutOfMemory handling through better memory bookkeeping in plasma store
#12040 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Use compression to reduce IO cost.
#11992 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Add more custom Error Types
#11871 commented on
Jun 16, 2025 • 0 new comments -
Tune report histograms
#11797 commented on
Jun 16, 2025 • 0 new comments -
[Feature Request] [Tune] Add a special 'evaluation-step' flag to avoid unnecessary lengthy evaluations
#11725 commented on
Jun 16, 2025 • 0 new comments -
Unable to create ActorHandle for already created inherited classes object list [java][ray]
#11715 commented on
Jun 16, 2025 • 0 new comments -
Socket connections from GCS stuck in TIME_WAIT after actor death
#11713 commented on
Jun 16, 2025 • 0 new comments -
[docs] tutorial for autoscaling (really basic version)
#11680 commented on
Jun 16, 2025 • 0 new comments -
[flaky] test_multi_node/2 is flaky
#11663 commented on
Jun 16, 2025 • 0 new comments -
[flaky] test_object_manager is flaky
#11661 commented on
Jun 16, 2025 • 0 new comments -
[Core] Reduce the Redis connection per worker.
#11655 commented on
Jun 16, 2025 • 0 new comments -
[flaky] gcs_server test is flaky
#11640 commented on
Jun 16, 2025 • 0 new comments -
Use Pathlib instead of strings in Autoscaler
#11633 commented on
Jun 16, 2025 • 0 new comments -
[tune] PopulationBasedTraining and Tensorboard HPARAMS
#11612 commented on
Jun 16, 2025 • 0 new comments -
AWS Security group rule issue
#11601 commented on
Jun 16, 2025 • 0 new comments -
[dask] Parquet write fails if directory does not exist in advance
#11566 commented on
Jun 16, 2025 • 0 new comments -
[dask] Object store fills up too quickly in simple processing script
#11565 commented on
Jun 16, 2025 • 0 new comments -
[dask/tune] Provide an example of using Dask on Ray with Tune
#11564 commented on
Jun 16, 2025 • 0 new comments -
[Core] Make CoreWorker more unit-testable
#13268 commented on
Jun 16, 2025 • 0 new comments -
Test S3 object spilling on multiple nodes with big data (streaming shuffle)
#13222 commented on
Jun 16, 2025 • 0 new comments -
[core] RAY_HOME path is hardcoded
#13168 commented on
Jun 16, 2025 • 0 new comments -
[Plasma Store]PlasmaClient::Get() return Status::OK() when timeout
#12995 commented on
Jun 16, 2025 • 0 new comments -
Add dashboard to bazel target to avoid running manual build commands
#12956 commented on
Jun 16, 2025 • 0 new comments -
Improve dashboard not found exception
#12955 commented on
Jun 16, 2025 • 0 new comments -
Cannot save training episodes: "TypeError: Object of type ndarray is not JSON serializable"
#12951 commented on
Jun 16, 2025 • 0 new comments -
[Object Spilling] Improve Read throughput
#12950 commented on
Jun 16, 2025 • 0 new comments -
Startup log use autoscaler_log.out / err instead of monitor.log
#12884 commented on
Jun 16, 2025 • 0 new comments -
[New scheduler] Don't assume 1-CPU tasks are feasible
#12870 commented on
Jun 16, 2025 • 0 new comments -
Turn on Test_reference_counting
#12849 commented on
Jun 16, 2025 • 0 new comments -
[Core] Locality-aware leasing: Milestone 3 - Spillback
#12815 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] Refactor bin packing routines in autoscaler for code clarity
#12723 commented on
Jun 16, 2025 • 0 new comments -
[Core] Ray.get(timeout=0) doesn't work
#12680 commented on
Jun 16, 2025 • 0 new comments -
[core] Is starvation possible for multi-driver on the same cluster?
#12667 commented on
Jun 16, 2025 • 0 new comments -
GCS server ip error
#12639 commented on
Jun 16, 2025 • 0 new comments -
[core] Support detached/GCS owned objects
#12635 commented on
Jun 16, 2025 • 0 new comments -
[autoscaler] respect max_workers per node type when terminating nodes
#12634 commented on
Jun 16, 2025 • 0 new comments -
[Cluster launcher] Command runner logs are improperly quoted when logged
#12631 commented on
Jun 16, 2025 • 0 new comments -
permissions on rsync'd files are incorrect on worker nodes, results in inability to update workers
#12630 commented on
Jun 16, 2025 • 0 new comments -
[tune] Full experiment checkpointing doesn't work with PBT
#12558 commented on
Jun 16, 2025 • 0 new comments -
New workers are started slowly on a node if running workers >= `num_cpus`
#12525 commented on
Jun 16, 2025 • 0 new comments -
[tune] get_checkpoint_paths fails due to glob command for .tune_metadata file
#12453 commented on
Jun 16, 2025 • 0 new comments -
[New scheduler] Implement dynamic resources
#12433 commented on
Jun 16, 2025 • 0 new comments -
[metrics] Investigate tracing visualization tools
#12314 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler][GCP] Autoscaler crashing on GCP with error 404.
#30050 commented on
Jun 16, 2025 • 0 new comments -
Setting some system configs causes Ray to fail to start
#29841 commented on
Jun 16, 2025 • 0 new comments -
[Ray Cluster] Assigning all host GPUs into head node without nvidia.com/gpu present
#29753 commented on
Jun 16, 2025 • 0 new comments -
[gcp] "No such container" error after ray up
#29671 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Passing a handle to grid search cause trials to get stuck in running and pending mode
#29545 commented on
Jun 16, 2025 • 0 new comments -
[Serve] `ServeHandles` fail if GCS crashes before first request
#29539 commented on
Jun 16, 2025 • 0 new comments -
[Core] util.multiprocessing.pool scheduling inefficiencies, blocking behavior in imap and imap_unordered
#29453 commented on
Jun 16, 2025 • 0 new comments -
[Core] inspect_serializability bug - parent object serializable but bound method not
#29423 commented on
Jun 16, 2025 • 0 new comments -
[Core] Ray doesn't shutdown properly on KeyboardInterrupt
#29384 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Unable to upload current working directory
#29354 commented on
Jun 16, 2025 • 0 new comments -
[core][observability] Improving reliability of memory_summary API call
#29329 commented on
Jun 16, 2025 • 0 new comments -
InvalidLocationConstraint Message: The specified location-constraint is not valid for storage option
#29309 commented on
Jun 16, 2025 • 0 new comments -
[Core] Worker pool didn't prestart num_cpus workers
#29162 commented on
Jun 16, 2025 • 0 new comments -
[core] use proto for oom error / node died error in the frontend
#28907 commented on
Jun 16, 2025 • 0 new comments -
[ray client] surface ray client logs better
#28890 commented on
Jun 16, 2025 • 0 new comments -
[Backlog][Collective] Facilitate NCCL test in ray cluster
#28860 commented on
Jun 16, 2025 • 0 new comments -
[core/k8s/GKE] Ray schedules actors on pods/nodes that are shutting down
#28852 commented on
Jun 16, 2025 • 0 new comments -
[Core] Cannot 'ray list nodes' after setting the environmental variable 'export RAY_ADDRESS="http://127.0.0.1:8265" '
#28847 commented on
Jun 16, 2025 • 0 new comments -
[AIR] [Tune] Don't add random hash to trial id for single trial
#28830 commented on
Jun 16, 2025 • 0 new comments -
[core] Object returned by a generator with num_returns="dynamic" should throw an error if reconstruction fails
#28688 commented on
Jun 16, 2025 • 0 new comments -
[P0] test_submit_cpp_job failed in osx
#28592 commented on
Jun 16, 2025 • 0 new comments -
[Ray: Core] - Unable to enable TLS on the ray head node
#28534 commented on
Jun 16, 2025 • 0 new comments -
Dashboard / Jobs RegexMatcher ignores "includes".
#28502 commented on
Jun 16, 2025 • 0 new comments -
[Core, RLlib] RLlib uses Metal GPU even when told not to
#28385 commented on
Jun 16, 2025 • 0 new comments -
[Ray Core] Actor Handles not properly passed to Actors created by other Actors
#32848 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] A3C has problems with the horizon option removed
#32812 commented on
Jun 16, 2025 • 0 new comments -
[Core][Object Store] Object Store to manage files in the cluster
#32694 commented on
Jun 16, 2025 • 0 new comments -
[Clusters] [KubeRay] problem with pending actors' pods in Kubernetes
#32651 commented on
Jun 16, 2025 • 0 new comments -
[core] Lock contention when submitting actor task on the client queue
#32595 commented on
Jun 16, 2025 • 0 new comments -
[Core] Install via `pip` fails, install with `conda` crashes worker and exits
#32423 commented on
Jun 16, 2025 • 0 new comments -
[Core] "ImportError: No module named ray" when using `ray submit`
#31924 commented on
Jun 16, 2025 • 0 new comments -
[workflow] memory leakage
#31819 commented on
Jun 16, 2025 • 0 new comments -
[Clusters] [RLlib] Trainer Object running on Worker node & RolloutWorker running on Head node
#31808 commented on
Jun 16, 2025 • 0 new comments -
[runtime envs] Ray Client Server failed when starting
#31622 commented on
Jun 16, 2025 • 0 new comments -
Huge numbers of "deleted" files with open processes left after Ray Tune run
#31556 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Reenable `zoopt` searcher test after fixes for handling invalid results are included in its next release
#31439 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Pytorch multiple optimizers
#31428 commented on
Jun 16, 2025 • 0 new comments -
In the docker bridge mode, pulling the actor on a non head node fails.
#31308 commented on
Jun 16, 2025 • 0 new comments -
[CORE] Unable to run celery task containing ray tasks
#31157 commented on
Jun 16, 2025 • 0 new comments -
[core] Segfaults when restarting Ray multiple times in unit tests with background threads running
#31145 commented on
Jun 16, 2025 • 0 new comments -
[core] Error with Slurm: No available node types can fulfill resource request {'node:<ip>': 0.01}.
#31135 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Not able to save evaluation recording videos
#30949 commented on
Jun 16, 2025 • 0 new comments -
[Ray Job] SchedulingCancelled for JobSupervisor Actor
#30898 commented on
Jun 16, 2025 • 0 new comments -
[Ray client] Ray Zombie Process Issue
#30894 commented on
Jun 16, 2025 • 0 new comments -
[Devprod] Bazel reports an error when compiling as a non-root user
#30885 commented on
Jun 16, 2025 • 0 new comments -
[release tests] Prometheus metrics collection sometimes takes 15min to run for long_running_node_failures
#30859 commented on
Jun 16, 2025 • 0 new comments -
[core] Disk full error logging is verbose
#30833 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Error when running RLlib
#30412 commented on
Jun 16, 2025 • 0 new comments -
[Cluster Launcher] `ray dashboard` CLI command does not stop port-forwarding after Ctrl+C
#30385 commented on
Jun 16, 2025 • 0 new comments -
[Core] Restore objects directly from S3
#24581 commented on
Jun 16, 2025 • 0 new comments -
[Ray component: Core] Dask on Ray - Worker processes go to idle state and not garbage collected when used with RayProgressBar()
#24556 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] `serialized_env` used as ID, but identical envs can produce different `serialized_env`
#24515 commented on
Jun 16, 2025 • 0 new comments -
[Core] No overloads for "remote" match the provided arguments
#24371 commented on
Jun 16, 2025 • 0 new comments -
Workflows: Type stubs are incorrect: argument missing for parameter status_filter
#24367 commented on
Jun 16, 2025 • 0 new comments -
[Core] /api/cluster_status treats placement groups differently than ray status
#24309 commented on
Jun 16, 2025 • 0 new comments -
[Core] Restore worker silently fails and the program is stuck
#24248 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][Bug] duplicate action unsquashing in DDPG / TD3 policy
#24213 commented on
Jun 16, 2025 • 0 new comments -
[Tune] support for FIRE PBT
#24137 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Tune Job hangs out and can't finish the tune job
#23858 commented on
Jun 16, 2025 • 0 new comments -
[Workflows] Cant use custom storage backends
#23831 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Add Option for Custom Sample Preprocessing when Sampling from Replay Buffer
#23815 commented on
Jun 16, 2025 • 0 new comments -
[Core][Bug] global-scoped actor handles/Ray objects prevents Ray workers from being destructed.
#23677 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] `zip_directory` `excludes` parameter doesn't work with absolute paths
#23473 commented on
Jun 16, 2025 • 0 new comments -
[Train] [Feature] Print useful traceback on SIGINT
#23148 commented on
Jun 16, 2025 • 0 new comments -
[Train] [Docs] Document how to change logging verbosity
#23147 commented on
Jun 16, 2025 • 0 new comments -
[docs][Bug] Workflow docs have few typos and type issue
#23113 commented on
Jun 16, 2025 • 0 new comments -
[tune][Feature] add tune.choices to select multiple values from a search space
#23001 commented on
Jun 16, 2025 • 0 new comments -
[Bug] An exception in a task cannot be caught with ActorPool.map_unordered making restarting meaningless
#22978 commented on
Jun 16, 2025 • 0 new comments -
Ray Train / Tune - W&B logger documentation
#22881 commented on
Jun 16, 2025 • 0 new comments -
[Train] update `logdir` relative path
#22753 commented on
Jun 16, 2025 • 0 new comments -
[Bug][placement groups] Actor scheduling does not respect placement_group=None
#22742 commented on
Jun 16, 2025 • 0 new comments -
[Cluster snapshot] [Bug] `runtime_env` fields in cluster snapshot are converted to camelcase when they should not be
#22565 commented on
Jun 16, 2025 • 0 new comments -
[Train] Add flags to disable creating log directories
#22261 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] [Feature] Support for having parametric action spaces/action masking for continuous action space models
#22259 commented on
Jun 16, 2025 • 0 new comments -
[Core] Actor methods will be modified for tracing even if tracing is not enabled.
#28293 commented on
Jun 16, 2025 • 0 new comments -
[Runtime] Improve runtime environment error message when virtualenv version is too old
#28232 commented on
Jun 16, 2025 • 0 new comments -
[Core] Multi-Threaded Actors are Un-Killable
#28086 commented on
Jun 16, 2025 • 0 new comments -
[Autoscaler] Assigning None to optional keys leads to failure
#28012 commented on
Jun 16, 2025 • 0 new comments -
[Core] Can't pickle objects defined in top-level environment
#28000 commented on
Jun 16, 2025 • 0 new comments -
[Doc] [Serve] Serve Loki monitoring tutorial screenshot has outdated API
#27453 commented on
Jun 16, 2025 • 0 new comments -
[core] Very slow task scheduling during Dataset.sort on 100TB
#27410 commented on
Jun 16, 2025 • 0 new comments -
Is Ray going to support Weighted Quantile Sketches or Quantile Sketches?
#27363 commented on
Jun 16, 2025 • 0 new comments -
[Core] Raylet continually exiting on worker in docker
#26576 commented on
Jun 16, 2025 • 0 new comments -
Tensorboard with Docker from Ray dashboard, tune tab cannot be accessed
#26325 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Eval episode runs forever if Env doesn't terminate properly
#26241 commented on
Jun 16, 2025 • 0 new comments -
[Ray Client] Using many concurrent client connections results in deadlock/hanging
#26144 commented on
Jun 16, 2025 • 0 new comments -
[Core] worker died randomly and unexpectedly under heavy workload (Check failed: inner_it->second.mutable_nested()->contained_in_borrowed_ids.erase(id))
#26128 commented on
Jun 16, 2025 • 0 new comments -
[Core][HA] Actor entries are not deleted from the storage permanently if GCS is crashed.
#26114 commented on
Jun 16, 2025 • 0 new comments -
Unclear error when using generator tasks
#25836 commented on
Jun 16, 2025 • 0 new comments -
[Core] SIGSEGV when I run experimental shu 341A ffle command.
#25650 commented on
Jun 16, 2025 • 0 new comments -
[Core][Metrics] Prometheus-client not working with the latest version.
#25523 commented on
Jun 16, 2025 • 0 new comments -
[core] Scheduler stalls during shuffle reduce stage with 100k concurrent tasks or more
#25412 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Utilities to go from Predictor to `BatchPredictor` and `ModelWrapperDeployment`
#24977 commented on
Jun 16, 2025 • 0 new comments -
[Train/AIR] Ray Train actors still use up resources after Notebook cell is stopped
#24947 commented on
Jun 16, 2025 • 0 new comments -
[Core] Failed to delete named actor in client mode
#24906 commented on
Jun 16, 2025 • 0 new comments -
[AIR] Add a `reconfigure` option to `ModelWrapperDeployment`
#24869 commented on
Jun 16, 2025 • 0 new comments -
[core] Uninformative error for unserialisable objects
#24863 commented on
Jun 16, 2025 • 0 new comments -
[Serve] Prototype C++ Worker in Serve
#24738 commented on
Jun 16, 2025 • 0 new comments -
[Core] Spilling performance regression in large-scale shuffle
#24667 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] User guides are not ordered
#41340 commented on
Jun 16, 2025 • 0 new comments -
error installing library
#41223 commented on
Jun 16, 2025 • 0 new comments -
[core][state][dashboard] Better tasks info GC control at GCS
#41142 commented on
Jun 16, 2025 • 0 new comments -
[RLLib] External simulator: mean episode reward is NaN due to done not set
#40954 commented on
Jun 16, 2025 • 0 new comments -
[Core] - Cannot install in tiny core linux
#40832 commented on
Jun 16, 2025 • 0 new comments -
[Tune|RLlib] Add error-tolerant version of PB2
#40787 commented on
Jun 16, 2025 • 0 new comments -
NCCL Proxy Call to rank 1 failed - on Cloud VM Docker setup for huggingface distributed ray train script
#40758 commented on
Jun 16, 2025 • 0 new comments -
[Ray Train] - Add Options to Save Last checkpoint in Ray Train Checkpointing Config
#40503 commented on
Jun 16, 2025 • 0 new comments -
ray.init() can sometimes hang with a limited range specified for --worker-port-list
#40497 commented on
Jun 16, 2025 • 0 new comments -
[Core] Dead session not closed
#40482 commented on
Jun 16, 2025 • 0 new comments -
[RLlib][MBMPO] The algorithm does not learn as intended.
#40400 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Support for new algorithm: Cost-Aware Pareto Region Bayesian Search (CARBS).
#40356 commented on
Jun 16, 2025 • 0 new comments -
[Workflow] Incorrectly set max_calls in options
#40252 commented on
Jun 16, 2025 • 0 new comments -
[PPOConfig] Utilising new API/models without matching documentation
#40201 commented on
Jun 16, 2025 • 0 new comments -
[Rllib] Tune locks up when attempting to create an rllib algorithm in a trainable
#40015 commented on
Jun 16, 2025 • 0 new comments -
[Tune/Air] Memory Leak when using WandbLoggerCallback with Population Based Tuning
#40014 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] TD3/DDPG doesn't seem to respect action space bounds (at least initially)?
#40002 commented on
Jun 16, 2025 • 0 new comments -
[RLLIB] Issue with AlphaZero algorithm Stateless CartPole
#39937 commented on
Jun 16, 2025 • 0 new comments -
[RLLIB] Error in executing StatelessCartPole environment with AlphaZero
#39862 commented on
Jun 16, 2025 • 0 new comments -
Allow train_loop_config to be a dataclass / pydantic model
#39824 commented on
Jun 16, 2025 • 0 new comments -
[Core] ResolutionImpossible - Test requirements appear to not fit versions
#39782 commented on
Jun 16, 2025 • 0 new comments -
Job history is lost when Ray cluster is restarted (via kuberay)
#39764 commented on
Jun 16, 2025 • 0 new comments -
Ray::Tune::Logger::Tensorboardx
#39741 commented on
Jun 16, 2025 • 0 new comments -
[Core] Upgrading grpc to 1.57.0 causes perf regressions
#39679 commented on
Jun 16, 2025 • 0 new comments -
ray failed to register worker when I used vllm
#39618 commented on
Jun 16, 2025 • 0 new comments -
WARNING deprecation.py:50 -- DeprecationWarning: `ray.rllib.execution.train_ops.multi_gpu_train_one_step` has been deprecated. This will raise an error in the future!
#43250 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Build_for_inference() in env_runner_v2.py created empty state_out_1 and lead to failure of initiation
#42978 commented on
Jun 16, 2025 • 0 new comments -
Core: ray.remote raises ValueError when used on torch IterableDataset
#42914 commented on
Jun 16, 2025 • 0 new comments -
Core: Join zombie subprocesses after task completion
#42913 commented on
Jun 16, 2025 • 0 new comments -
[Core] SIGSEGV when running Ray
#42868 commented on
Jun 16, 2025 • 0 new comments -
[Core] Serialisation does not work with classes with `__init_subclass__`
#42823 commented on
Jun 16, 2025 • 0 new comments -
Problem with YOLOv8 Hyperparameters tuning
#42770 commented on
Jun 16, 2025 • 0 new comments -
[RLLIB] Passing configuration to Custom Environment in rllib is giving an error
#42753 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Algorithms ES, A3C are deprecated and replacement does not exist in python package
#42579 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] Inite state of attention_net.py is empty
#42569 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] KeyError with RNN
#42501 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] gpu cannot enable
#42388 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] reslink in model
#42333 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] shape [] in Box action space not supported.
#42199 commented on
Jun 16, 2025 • 0 new comments -
Building an executable using Ray and Cx_freeze
#42101 commented on
Jun 16, 2025 • 0 new comments -
RichProgressBar in PyTorch Lightning only show progress at the very end
#42091 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: Core|RLlib|etc...>] Channel errore
#42089 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] When setting `config.environment(normalize_actions=False)` and using CQL, it raises an error: `AttributeError: 'TorchDiagGaussian' object has no attribute 'sample_logp'`.
#42064 commented on
Jun 16, 2025 • 0 new comments -
[Workflow] get_metadata() returns RUNNING instead of RESUMABLE status
#41980 commented on
Jun 16, 2025 • 0 new comments -
Ray IDs vs endianness?
#41961 commented on
Jun 16, 2025 • 0 new comments -
"RaySystemError: System error: Unknown error"
#41786 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Value error while running DQN
#41559 commented on
Jun 16, 2025 • 0 new comments -
Is there an error in the interface name " cdef cppclass CFunctionDescriptorInterface "ray::CFunctionDescriptorInterface "?
#41553 commented on
Jun 16, 2025 • 0 new comments -
Saving XGBoost model with json extension
#41374 commented on
Jun 16, 2025 • 0 new comments -
[data] date32 and datetime64 handling should be the same
#41358 commented on
Jun 16, 2025 • 0 new comments 10000 -
[data] Report actual task time and object sizes in Dataset.stats()
#36671 commented on
Jun 16, 2025 • 0 new comments -
[CI][Docs] Example in Train FAQ is flakey
#36399 commented on
Jun 16, 2025 • 0 new comments -
Be consistent on whether or not you include a dot at the end of a bullet list element.
#36308 commented on
Jun 16, 2025 • 0 new comments -
[Core] ray.put and ray.get extremely slow with polars frames
#36068 commented on
Jun 16, 2025 • 0 new comments -
[Ray Core] There is a Exception error message bug which convert byte array to String.
#35880 commented on
Jun 16, 2025 • 0 new comments -
System error: Ray has not been started yet. You can start Ray with 'ray.init()'
#35592 commented on
Jun 16, 2025 • 0 new comments -
[Client] Dataset write_csv AttributeError: ‘Worker’ object has no attribute 'core_worker'
#35537 commented on
Jun 16, 2025 • 0 new comments -
Ray: Data - Cannot read json its written to s3
#35501 commented on
Jun 16, 2025 • 0 new comments -
[Core] `OwnerDiedError` if dataset owner actor handle get out of scope
#35262 commented on
Jun 16, 2025 • 0 new comments -
[VM launcher] Automtically shut down the ec2 machine when I stop ray up in the middle
#35013 commented on
Jun 16, 2025 • 0 new comments -
[Core] Incorrect detection of cpus
#34846 commented on
Jun 16, 2025 • 0 new comments -
[Clusters] - Cannot switch off rsync during Cluster Launch with `ray up`
#34390 commented on
Jun 16, 2025 • 0 new comments -
Azure autoscaler cannot create additional nodes
#34198 commented on
Jun 16, 2025 • 0 new comments -
[Core] Error in external storage writing for object spilling
#33913 commented on
Jun 16, 2025 • 0 new comments -
get_node_to_storage_syncer has an empty docstring
#33841 commented on
Jun 16, 2025 • 0 new comments -
[ Core ] Correct usage of min/max-worker-port arguments
#33749 commented on
Jun 16, 2025 • 0 new comments -
Core: nightly builds for macos only include an x86 _raylet.so even though they claim to be universal
#33720 commented on
Jun 16, 2025 • 0 new comments -
[Core] The resources have minus values in ray status output
#33569 commented on
Jun 16, 2025 • 0 new comments -
[tune] tqdm/Hyperopt-style TuneReporter for Databricks notebooks
#33519 commented on
Jun 16, 2025 • 0 new comments -
[Core] Ray client doesn't support `should_capture_child_tasks_in_placement_group` API
#33513 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] DictFlatteningPreprocessor order is inconsistent leads to invalid mapping of OBS
#33327 commented on
Jun 16, 2025 • 0 new comments -
[Dashboard] Head node exited unexceptly because of dashboard process exited
#31261 commented on
Jun 16, 2025 • 0 new comments -
[runtime env] Raise warning when using `runtime_env` with `local_mode=True`
#33260 commented on
Jun 16, 2025 • 0 new comments -
[Core] `get_runtime_context()` in task fails with unhelpful error "cannot pickle '_thread.lock' object"
#32987 commented on
Jun 16, 2025 • 0 new comments -
[Train] Benchmark testing on Mosaic Composer with Ray
#32946 commented on
Jun 16, 2025 • 0 new comments -
[rllib] Action space MultiDiscrete([11 5 1 2]) is not supported for DQN
#39571 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Support JAX-(numpy)-based envs.
#39528 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Ray RLLib Dependencies Version Information
#39405 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] dreamerv3 causes debug code to be executed when running tune
#39302 commented on
Jun 16, 2025 • 0 new comments -
[Core] CPP Interface crashes on Ray.Init()
#39252 commented on
Jun 16, 2025 • 0 new comments -
ValueError: Must set agent_id on policy config
#39246 commented on
Jun 16, 2025 • 0 new comments -
[Core] Actor retry count is consumed because the task is retried when actor is still alive.
#39110 commented on
Jun 16, 2025 • 0 new comments -
[Core] Memory Leak
#38877 commented on
Jun 16, 2025 • 0 new comments -
[Tune] Leaky core concepts in Ray Tune documentation
#38781 commented on
Jun 16, 2025 • 0 new comments -
latest ray microbenchmark fails
#38758 commented on
Jun 16, 2025 • 0 new comments -
Ray Memory Usage Keeps Increasing even after Manual Garbage Collection
#38730 commented on
Jun 16, 2025 • 0 new comments -
[docs] Document Tune/Train placement group
#38706 commented on
Jun 16, 2025 • 0 new comments -
ray/RLlib/offline/estimators
#38357 commented on
Jun 16, 2025 • 0 new comments -
[Core] gcs_server Failed accept4: Too many open files
#38248 commented on
Jun 16, 2025 • 0 new comments -
[Core] Segfault from fibers when using streaming/dynamic generator (only happening from test_streaming_generator_exception)
#38167 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] RL module and PPO implementation
#38012 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] ray 2.6 relies on tf.bool which does not exist in tensorflow 2.13
#37895 commented on
Jun 16, 2025 • 0 new comments -
[RLlib] Sampler takes first step before next batch is requested
#37893 commented on
Jun 16, 2025 • 0 new comments -
[Data] Ray 2.6 created a breaking change in the index of a Modin DataFrame
#37771 commented on
Jun 16, 2025 • 0 new comments -
[Ray-Java client] Call actor report 'No module named' with py script
#37600 commented on
Jun 16, 2025 • 0 new comments -
[Core] Ray cpp example, if not call ray::Shutdown when exit, will cause segment fault.
#37596 commented on
Jun 16, 2025 • 0 new comments -
RLLib: Training Rllib-DDPG with custom environment leads error in Inference.
#37242 commented on
Jun 16, 2025 • 0 new comments -
[<Ray component: autoscaler>] _load_kubernetes_defaults_config function is not yet made
#37033 commented on
Jun 16, 2025 • 0 new comments -
[Core] Activate Ray tracing casue error when calling actor method with decorator with wraps. (i propose the possible solution)
#36891 commented on
Jun 16, 2025 • 0 new comments -
[Core] No dependency on setuptools results in broken build
#36742 commented on
Jun 16, 2025 • 0 new comments