[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation #26634

ericl · 2022-07-16T23:57:37Z

Why are these changes needed?

As a followup for #26397, add this to the docs and API as an experimental feature.

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl · 2022-07-16T23:58:43Z

python/ray/train/tests/test_base_trainer.py

+    tune.run(trainer.as_trainable(), num_samples=4)
+
+
+# TODO(ekl/sang) this currently fails.


@rkooo567 , it seems this fails since all CPUs end up excluded. Can we ensure at least 1 CPU is available on either side no matter how aggressive the fraction is?

Btw, I think we should disallow 1.0 and 0.0 as values (raise ValueError).

Filed #26635

matthewdeng · 2022-07-17T00:14:46Z

doc/source/data/key-concepts.rst

+    ``_max_cpu_fraction_per_node`` is experimental and not recommended for use with
+    autoscaling clusters.


Should we be more explicit and say that the reason is that doing so may cause deadlock?

Signed-off-by: Eric Liang <ekhliang@gmail.com>

python/ray/util/placement_group.py

Signed-off-by: Eric Liang <ekhliang@gmail.com>

…on (ray-project#26634) Signed-off-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>

…on (ray-project#26634) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>

…on (ray-project#26634) Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>

ericl added 4 commits July 16, 2022 16:43

wip

e82ba16

Signed-off-by: Eric Liang <ekhliang@gmail.com>

update docs

3ec4f47

Signed-off-by: Eric Liang <ekhliang@gmail.com>

add tests

4eddf99

Signed-off-by: Eric Liang <ekhliang@gmail.com>

lint

704f2f9

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl requested review from scv119, clarkzinzow, jjyao, jianoaix, maxpumperla, pcmoritz, richardliaw, edoakes and simon-mo as code owners July 16, 2022 23:57

ericl commented Jul 16, 2022

View reviewed changes

ericl assigned matthewdeng and rkooo567 Jul 16, 2022

matthewdeng approved these changes Jul 17, 2022

View reviewed changes

ericl added 3 commits July 16, 2022 17:41

warn

8ca00e0

Signed-off-by: Eric Liang <ekhliang@gmail.com>

Merge remote-tracking branch 'upstream/master' into ingest-lim

9997e08

update

8bb2033

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl force-pushed the ingest-lim branch from 973bf74 to 8bb2033 Compare July 17, 2022 00:42

ericl added 4 commits July 16, 2022 17:43

update

e9d0882

Signed-off-by: Eric Liang <ekhliang@gmail.com>

update

a33aa7e

Signed-off-by: Eric Liang <ekhliang@gmail.com>

update

875e1d7

Signed-off-by: Eric Liang <ekhliang@gmail.com>

update

b9de278

8000
Signed-off-by: Eric Liang <ekhliang@gmail.com>

richardliaw reviewed Jul 17, 2022

View reviewed changes

python/ray/util/placement_group.py Show resolved Hide resolved

scv119 approved these changes Jul 17, 2022

View reviewed changes

richardliaw approved these changes Jul 17, 2022

View reviewed changes

fix test

865cf75

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl added the do-not-merge Do not merge this PR! label Jul 17, 2022

ericl merged commit 400330e into ray-project:master Jul 17, 2022

ericl mentioned this pull request Jul 17, 2022

[air] Add a warning if no CPUs are reserved for dataset execution #26643

Merged

jianoaix pushed a commit to jianoaix/ray that referenced this pull request Jul 18, 2022

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentati…

41295f4

…on (ray-project#26634) Signed-off-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>

xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentati…

428ca72

…on (ray-project#26634) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>

Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentati…

82cecac

…on (ray-project#26634) Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation #26634

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation #26634

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		tune.run(trainer.as_trainable(), num_samples=4)


		# TODO(ekl/sang) this currently fails.

		``_max_cpu_fraction_per_node`` is experimental and not recommended for use with
		autoscaling clusters.

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation #26634

[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation #26634

Uh oh!

Conversation

Why are these changes needed?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!