Cannot extract MySQL datetime value 0000-00-00 00:00:00 with pyarrow or connectorx · Issue #2699 · dlt-hub/dlt · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have MySQL 5.7.24 tables containing some datetime values set to '0000-00-00 00:00:00'. This seems to be a MySQL specific thing when columns are not nullables and strict checking is not enabled. I assume with naive type coercion expecting '1970-01-01 00:00:00' as the sensible min() value, this results in a negative / invalid timestamp.
However, a quick internet search indicates that this appears to be a known thing, already accounted for by many tools.
Using backend set to either 'sqlalchemy' or 'pandas', dlt does just fine, and I get the correct ;1970-01-01 00:00:00' in my destination (clickhouse, not that it's important for the issue, the problem appears during extraction).
Using 'pyarrow' as a backend, I am getting:
Traceback (most recent call last):
File "/root/test_dlt/cgrates_cdrs_pipeline.py", line 124, in <module>
load_standalone_table_resource()
File "/root/test_dlt/cgrates_cdrs_pipeline.py", line 57, in load_standalone_table_resource
info = pipeline.extract([cgrates_cdrs], write_disposition="append", workers=8)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 222, in _wrap
step_info = f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 176, in _wrap
rv = f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 162, in _wrap
return f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 271, in _wrap
return f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 489, in extract
raise PipelineStepFailed(
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1748586785.3124416 with exception:
<class 'dlt.extract.exceptions.ResourceExtractionError'>
In processing pipe cdrs_20250517: extraction of resource cdrs_20250517 in generator table_rows caused an exception: Conversion to arrow failed for field `answer_time` with dlt hint `data_type=timestamp` and `inferred_arrow_type=timestamp[us]` This data type seems currently unsupported by dlt. Please open a GitHub issue
Using connectorx as a backend, the error is:
thread '<unnamed>' panicked at /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mysql_common-0.29.2/src/value/convert/mod.rs:175:23:
Could not retrieve chrono::naive::datetime::NaiveDateTime from Value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/root/test_dlt/cgrates_cdrs_pipeline.py", line 124, in <module>
load_standalone_table_resource()
File "/root/test_dlt/cgrates_cdrs_pipeline.py", line 57, in load_standalone_table_resource
info = pipeline.extract([cgrates_cdrs], write_disposition="append", workers=8)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 222, in _wrap
step_info = f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 176, in _wrap
rv = f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 162, in _wrap
return f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 271, in _wrap
return f(self, *args, **kwargs)
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 468, in extract
self._extract_source(
File "/usr/lib/python3.9/site-packages/dlt/pipeline/pipeline.py", line 1254, in _extract_source
load_id = extract.extract(
File "/usr/lib/python3.9/site-packages/dlt/extract/extract.py", line 457, in extract
self._extract_single_source(
File "/usr/lib/python3.9/site-packages/dlt/extract/extract.py", line 380, in _extract_single_source
for pipe_item in pipes:
File "/usr/lib/python3.9/site-packages/dlt/extract/pipe_iterator.py", line 162, in __next__
pipe_item = self._get_source_item()
File "/usr/lib/python3.9/site-packages/dlt/extract/pipe_iterator.py", line 277, in _get_source_item
pipe_item = next(gen)
File "/usr/lib/python3.9/site-packages/dlt/sources/sql_database/helpers.py", line 303, in table_rows
yield from loader.load_rows(backend_kwargs)
File "/usr/lib/python3.9/site-packages/dlt/sources/sql_database/helpers.py", line 176, in load_rows
yield from self._load_rows_connectorx(query, backend_kwargs)
File "/usr/lib/python3.9/site-packages/dlt/sources/sql_database/helpers.py", line 234, in _load_rows_connectorx
df = cx.read_sql(conn, query_str, **backend_kwargs)
File "/usr/lib/python3.9/site-packages/connectorx/__init__.py", line 386, in read_sql
result = _read_sql(
pyo3_runtime.PanicException: Could not retrieve chrono::naive::datetime::NaiveDateTime from Value
Expected behavior
Coerce the 0000-00-00 00:00:00 to 1970-01-01 00:00:00 or let the destination do it.
With pandas, it seems the later is the default behavior:
2025-05-30 06:50:44,824|[WARNING]|1749067|139738012964672|dlt|extractors.py|_compute_tables:487|In resource: cdrs_20250517, when merging arrow schema with dlt schema, several column hints were different. dlt schema hints were kept and arrow schema and data were unmodified. It is up to destination to coerce the differences when loading. Change log level to INFO for more details.
With sqlalchemy, it just works as expected.
Steps to reproduce
Need to have a source MySQL table with a datetime NOT NULL column
Use an old enough MySQL that allows zero dates or which has NO_ZERO_DATE mode not set (deprecated in 8.4 I believe)
Not have strict mode enabled if the above applies
Try to extract that table using the backends mentioned above (pyarrow, connectorx).
Operating system
Linux
Runtime environment
Virtual Machine
Python version
3.9
dlt data source
sql_database sing msql+pymysql driver
dlt destination
clickhouse 25.2.2.39
Other deployment details
No response
Additional information
I am quite new to dlt, started evaluating just yesterday in the hope to replace my own custom python ode with it for our extract and load task from various source to clickhouse, so bear with me for being a beginner.
Is there a workaround to allow thoses backends to work?
Is there any more testing I can do to help?
The text was updated successfully, but these errors were encountered:
tramjoe
changed the title
Cannot load MySQL datetime value 0000-00-00 00:00:00 with pyarrow or connectorx
Cannot extract MySQL datetime value 0000-00-00 00:00:00 with pyarrow or connectorx
May 30, 2025
@tramjoe looks like connector-x bug. we are investigating several related bugs with wrong date-time handling. there's a new connector-x version where arrow2 backend was removed. maybe you can try it with arrow backend? check here for more info: #2661 (comment)
tldr;> this is connector-x bug so if it is not fixed in the newest version you need to file bug report there....
Uh oh!
There was an error while loading. Please reload this page.
dlt version
1.11.0
Describe the problem
We have MySQL 5.7.24 tables containing some datetime values set to '0000-00-00 00:00:00'. This seems to be a MySQL specific thing when columns are not nullables and strict checking is not enabled. I assume with naive type coercion expecting '1970-01-01 00:00:00' as the sensible min() value, this results in a negative / invalid timestamp.
However, a quick internet search indicates that this appears to be a known thing, already accounted for by many tools.
Using backend set to either 'sqlalchemy' or 'pandas', dlt does just fine, and I get the correct ;1970-01-01 00:00:00' in my destination (clickhouse, not that it's important for the issue, the problem appears during extraction).
Using 'pyarrow' as a backend, I am getting:
Using
connectorx
as a backend, the error is:Expected behavior
Coerce the 0000-00-00 00:00:00 to 1970-01-01 00:00:00 or let the destination do it.
With pandas, it seems the later is the default behavior:
With sqlalchemy, it just works as expected.
Steps to reproduce
datetime NOT NULL
columnOperating system
Linux
Runtime environment
Virtual Machine
Python version
3.9
dlt data source
sql_database sing msql+pymysql driver
dlt destination
clickhouse 25.2.2.39
Other deployment details
No response
Additional information
I am quite new to dlt, started evaluating just yesterday in the hope to replace my own custom python ode with it for our extract and load task from various source to clickhouse, so bear with me for being a beginner.
Is there a workaround to allow thoses backends to work?
Is there any more testing I can do to help?
The text was updated successfully, but these errors were encountered: