8000 Enable setting OS disk size in Azure by VikParuchuri · Pull Request #45867 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Enable setting OS disk size in Azure #45867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

VikParuchuri
Copy link
@VikParuchuri VikParuchuri commented Jun 11, 2024

Why are these changes needed?

Currently, the default OS disks on Azure have a capacity of around 30GB. This causes cluster creation and tasks to often fail because many ray docker images and support files are around this size. When it does succeed, it also prevents object spilling.

This enables specifying the OS disk size to get around these limitations. It also ups the default OS disk size to 64GB.

Related issue number

This is a related discussion.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Vik Paruchuri <github@vikas.sh>
@kekulai-fredchang
Copy link
Contributor

@gramhagen

can we get an approval or your team's perspective on this?

  • I have been encountering stability issues with sftp fsspec mounts and a great backup is to download large files to disk then load ray data pipelines. Having this ability to set the disk size would match the AWS cluster config options with an AZURE alternative.

  • note default value of 64GB maybe too small as current defaults deploy at 150GB.

@gramhagen
Copy link
Contributor

this sounds like a good idea to me!

@kekulai-fredchang
Copy link
Contributor
kekulai-fredchang commented Dec 5, 2024

@ericl @architkulkarni @hongchaodeng

12/4/2024 - I confirmed this PR worked with ray 2.39.0 for azure both head and workers nodes.

however, @VikParuchuri modify your PR so the the json passes a null if the user does not set the disk size since the default sizes seem to creep up over time -- current defaults are at 150GB.

@@ -44,6 +44,13 @@
"description": "The version of the VM image"
}
},
"diskSizeGB": {
"type": "int",
"defaultValue": 64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, @VikParuchuri modify your PR so the the json passes a null if the user does not set the disk size since the default sizes seem to creep up over time -- current defaults are at 150GB.

We should address this comment?

@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Dec 11, 2024
Copy link
stale bot commented Jan 22, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 22, 2025
@jjyao
Copy link
Collaborator
jjyao commented Mar 25, 2025

@VikParuchuri do you have time to address the review comments? Thanks!

@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 25, 2025
@cszhu cszhu added core-autoscaler autoscaler related issues core Issues that should be addressed in Ray Core labels Apr 3, 2025
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025
Copy link
stale bot commented May 6, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 6, 2025
@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 27, 2025
@can-anyscale
Copy link
Collaborator

stale pr, let me know once you update it

@masoudcharkhabi masoudcharkhabi self-requested a review June 10, 2025 18:05
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community core Issues that should be addressed in Ray Core core-autoscaler autoscaler related issues go add ONLY when ready to merge, run all tests stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0