Open
Description
What happened + What you expected to happen
Line 460-492: python/ray/data/_internal/pandas_block.py
_When self.table is empty, should skip this code block, otherwise it might produce many unnecessary error messages (by Line 492).
2025-06-06 15:32:35,388 INFO worker.py:1694 -- Connecting to existing Ray cluster at address: 127.0.0.1:6379...
2025-06-06 15:32:35,399 INFO worker.py:1879 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
2025-06-06 15:32:37,222 INFO logging.py:290 -- Registered dataset logger for dataset dataset_13_0
(get_table_block_metadata pid=86973) **### Error calculating size for column 'b': cannot call `vectorize` on size 0 inputs unless `otypes` is set**
2025-06-06 15:32:37,250 INFO streaming_executor.py:117 -- Starting execution of Dataset dataset_13_0. Full logs are in /tmp/ray/session_2025-06-06_14-54-59_325275_83327/logs/ray-data
2025-06-06 15:32:37,250 INFO streaming_executor.py:118 -- Execution plan of Dataset dataset_13_0: InputDataBuffer[Input] -> LimitOperator[limit=20]
Running 0: 0.00 row [00:00, ? row/s] 2025-06-06 15:32:37,288 INFO streaming_executor.py:220 -- ✔️ Dataset dataset_13_0 execution finished in 0.04 seconds00 row [00:00, ? row/s]
✔️ Dataset dataset_13_0 execution finished in 0.04 seconds: : 0.00 row [00:00, ? row/s]
- limit=20: Tasks: 0; Queued blocks: 0; Resources: 0.0 CPU, 124.0B object store: : 0.00 row [00:00, ? row/s]
Versions / Dependencies
This issue is introduced since version 2.41.0.
Reproduction script
import pandas
import ray
df = pandas.DataFrame({"a":[1,2,3], "b":["a", "b", "c"]})
df_empty = df.head(0)
ds = ray.data.from_pandas(df_empty)
ds.take_batch()
Issue Severity
Low: It annoys or frustrates me.