8000 Documentation Issue with train_test_split and blockwise · Issue #999 · dask/dask-ml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Documentation Issue with train_test_split and blockwise #999
Open
@christhorn2

Description

@christhorn2

Describe the issue:

API Documentation of dask train_test_split states that blockwise=False is supported for Arrays:
"For Dask Arrays, set blockwise=False to shuffle data between blocks as well."
https://ml.dask.org/modules/generated/dask_ml.model_selection.train_test_split.html#dask_ml.model_selection.train_test_split

This is the intention of the code too I think, and it delegates the job to ShuffleSplit:

elif all(isinstance(arr, da.Array) for arr in arrays):

However, ShuffleSplit does not support blockwise=False:

def _split(self, X):

Minimal Complete Verifiable Example:

from dask_ml.model_selection import train_test_split
import dask.array as da
x = da.arange(8, chunks=4)
train_test_split(x,blockwise=false)
....
NotImplementedError: ShuffleSplit with blockwise=False has not been implemented yet.

Environment:

  • Dask version: 2024.4.4
  • Python version: 3.9.18
  • Operating System:
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0