8000 [<Ray component: Data>] lack of check for empty table produce lots of error messages · Issue #53605 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[<Ray component: Data>] lack of check for empty table produce lots of error messages #53605
Open
@yangs16

Description

@yangs16

What happened + What you expected to happen

Line 460-492: python/ray/data/_internal/pandas_block.py

_When self.table is empty, should skip this code block, otherwise it might produce many unnecessary error messages (by Line 492).

2025-06-06 15:32:35,388	INFO worker.py:1694 -- Connecting to existing Ray cluster at address: 127.0.0.1:6379...
2025-06-06 15:32:35,399	INFO worker.py:1879 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 
2025-06-06 15:32:37,222	INFO logging.py:290 -- Registered dataset logger for dataset dataset_13_0
(get_table_block_metadata pid=86973) **### Error calculating size for column 'b': cannot call `vectorize` on size 0 inputs unless `otypes` is set**
2025-06-06 15:32:37,250	INFO streaming_executor.py:117 -- Starting execution of Dataset dataset_13_0. Full logs are in /tmp/ray/session_2025-06-06_14-54-59_325275_83327/logs/ray-data
2025-06-06 15:32:37,250	INFO streaming_executor.py:118 -- Execution plan of Dataset dataset_13_0: InputDataBuffer[Input] -> LimitOperator[limit=20]
Running 0: 0.00 row [00:00, ? row/s]   2025-06-06 15:32:37,288	INFO streaming_executor.py:220 -- ✔️  Dataset dataset_13_0 execution finished in 0.04 seconds00 row [00:00, ? row/s]
✔️  Dataset dataset_13_0 execution finished in 0.04 seconds: : 0.00 row [00:00, ? row/s] 
- limit=20: Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 124.0B object store: : 0.00 row [00:00, ? row/s]

Versions / Dependencies

This issue is introduced since version 2.41.0.

Reproduction script

import pandas
import ray

df = pandas.DataFrame({"a":[1,2,3], "b":["a", "b", "c"]})
df_empty = df.head(0)

ds = ray.data.from_pandas(df_empty)

ds.take_batch()

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Assignees

Labels

P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tdataRay Data-related issuesusability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0