Description
Describe the issue:
API Documentation of dask train_test_split states that blockwise=False is supported for Arrays:
"For Dask Arrays, set blockwise=False to shuffle data between blocks as well."
https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split
This is the intention of the code too I think, and it delegates the job to ShuffleSplit:
dask-ml/dask_ml/model_selection/_split.py
Line 490 in 567cfd7
However, ShuffleSplit does not support blockwise=False:
dask-ml/dask_ml/model_selection/_split.py
Line 194 in 567cfd7
Minimal Complete Verifiable Example:
from dask_ml.model_selection import train_test_split
import dask.array as da
x = da.arange(8, chunks=4)
train_test_split(x,blockwise=false)
....
NotImplementedError: ShuffleSplit with blockwise=False
has not been implemented yet.
Environment:
- Dask version: 2024.4.4
- Python version: 3.9.18
- Operating System:
- Install method (conda, pip, source): pip