-
-
Notifications
You must be signed in to change notification settings - Fork 44
Partitioning should not overwrite schema #1752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Flow PHP - BenchmarksResults of the benchmarks from this PR are compared with the results from 1.x branch. Extractors+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
| CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.846mb -0.00% | 437.282ms -0.62% | ±0.11% -67.95% |
| ExcelExtractorBench | bench_extract_10k_ods | 1 | 3 | 65.541mb -0.00% | 1.063s -0.05% | ±1.89% +91.16% |
| ExcelExtractorBench | bench_extract_10k_xlsx | 1 | 3 | 67.587mb -0.00% | 1.680s -0.41% | ±0.77% +15.89% |
| JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.438mb -0.03% | 1.162s -1.66% | ±0.56% -71.35% |
| ParquetExtractorBench | bench_extract_10k | 1 | 3 | 86.399mb -0.00% | 901.605ms +0.07% | ±1.12% +266.17% |
| TextExtractorBench | bench_extract_10k | 1 | 3 | 4.569mb -0.00% | 42.004ms -0.95% | ±0.10% -91.25% |
| XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.554mb -0.00% | 602.358ms -1.21% | ±0.90% -23.90% |
+-----------------------+------------------------+------+-----+-----------------+------------------+-----------------+
Transformers+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
| RenameEachEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 18.565mb -0.00% | 73.624ms +0.66% | ±1.23% +47.98% |
| RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 123.303mb -0.00% | 67.322ms +0.54% | ±0.90% +600.18% |
+---------------------------------+--------------------------+------+-----+------------------+-----------------+-----------------+
Loaders+--------------------+----------------+------+-----+------------------+------------------+----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
| CSVLoaderBench | bench_load_10k | 1 | 3 | 62.506mb +0.00% | 88.919ms +0.28% | ±0.43% -76.11% |
| JsonLoaderBench | bench_load_10k | 1 | 3 | 80.587mb +0.00% | 102.855ms -0.73% | ±0.52% +31.25% |
| ParquetLoaderBench | bench_load_10k | 1 | 3 | 166.299mb +0.00% | 2.012s +0.29% | ±0.30% -52.69% |
| TextLoaderBench | bench_load_10k | 1 | 3 | 17.871mb +0.00% | 30.673ms -2.85% | ±0.41% -55.45% |
+--------------------+----------------+------+-----+------------------+------------------+----------------+
Building Blocks+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| benchmark | subject | revs | its | mem_peak | mode | rstdev |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
| TypeDetectorBench | bench_type_detector | 1 | 3 | 42.515mb +0.00% | 409.041ms +0.09% | ±0.85% +85.20% |
| TypeDetectorBench | bench_type_detector | 1 | 3 | 11.572mb +0.00% | 80.910ms -1.81% | ±0.58% -68.92% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 105.984mb +0.00% | 649.393ms +0.19% | ±0.49% -56.10% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 55.259mb +0.00% | 326.852ms +0.50% | ±0.63% -51.88% |
| EntryFactoryBench | bench_entry_factory | 1 | 3 | 14.845mb +0.00% | 69.885ms -1.07% | ±0.50% -33.37% |
| RowsBench | bench_chunk_10_on_10k | 2 | 3 | 93.455mb +0.00% | 3.500ms -2.81% | ±0.41% -80.01% |
| RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 110.826mb +0.00% | 243.040ms +1.55% | ±0.99% +165.58% |
| RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 93.546mb +0.00% | 24.463ms -0.57% | ±0.59% +53.73% |
| RowsBench | bench_drop_1k_on_10k | 2 | 3 | 94.330mb +0.00% | 1.520ms -12.19% | ±3.11% +1.97% |
| RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 94.330mb +0.00% | 1.442ms -19.49% | ±1.74% -51.98% |
| RowsBench | bench_entries_on_10k | 2 | 3 | 92.490mb +0.00% | 3.661ms -3.44% | ±1.99% -33.42% |
| RowsBench | bench_filter_on_10k | 2 | 3 | 93.019mb +0.00% | 16.177ms +1.09% | ±1.44% +39.55% |
| RowsBench | bench_find_on_10k | 2 | 3 | 93.019mb +0.00% | 15.615ms +0.85% | ±1.27% -20.16% |
| RowsBench | bench_find_one_on_10k | 10 | 3 | 91.708mb +0.00% | 1.994μs -0.30% | ±2.40% +0.00% |
| RowsBench | bench_first_on_10k | 10 | 3 | 91.708mb +0.00% | 0.400μs 0.00% | ±0.00% 0.00% |
| RowsBench | bench_flat_map_on_1k | 2 | 3 | 100.769mb +0.00% | 14.851ms -0.71% | ±2.32% -14.53% |
| RowsBench | bench_map_on_10k | 2 | 3 | 130.196mb +0.00% | 68.055ms -1.07% | ±1.08% +218.24% |
| RowsBench | bench_merge_1k_on_10k | 2 | 3 | 93.539mb +0.00% | 1.436ms -3.34% | ±0.68% -74.03% |
| RowsBench | bench_partition_by_on_10k | 2 | 3 | 96.908mb +0.00% | 64.933ms +2.10% | ±1.31% +50.33% |
| RowsBench | bench_remove_on_10k | 2 | 3 | 94.592mb +0.00% | 3.589ms -1.48% | ±2.61% +58.07% |
| RowsBench | bench_sort_asc_on_1k | 2 | 3 | 92.070mb +0.00% | 40.427ms -0.13% | ±1.36% -51.19% |
| RowsBench | bench_sort_by_on_1k | 2 | 3 | 92.070mb +0.00% | 39.577ms -5.26% | ±2.31% +65.13% |
| RowsBench | bench_sort_desc_on_1k | 2 | 3 | 92.070mb +0.00% | 40.167ms -0.80% | ±1.10% +213.14% |
| RowsBench | bench_sort_entries_on_1k | 2 | 3 | 94.151mb +0.00% | 8.227ms -3.01% | ±2.62% +227.16% |
| RowsBench | bench_sort_on_1k | 2 | 3 | 91.901mb +0.00% | 29.975ms -1.25% | ±1.81% +74.24% |
| RowsBench | bench_take_1k_on_10k | 10 | 3 | 91.708mb +0.00% | 14.442μs -3.01% | ±2.26% -26.62% |
| RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 91.708mb +0.00% | 16.462μs -4.53% | ±2.30% +33.34% |
| RowsBench | bench_unique_on_1k | 2 | 3 | 110.827mb +0.00% | 241.318ms -1.44% | ±1.27% -12.67% |
+-------------------+----------------------------+------+-----+------------------+------------------+-----------------+
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 1.x #1752 +/- ##
=======================================
Coverage 81.30% 81.30%
=======================================
Files 717 717
Lines 19949 19948 -1
=======================================
Hits 16219 16219
+ Misses 3730 3729 -1
🚀 New features to boost your workflow:
|
{ | ||
$this->setupFiles([__FUNCTION__ => []]); | ||
|
||
$fs = new FakeNativeLocalFilesystem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be replaced with src/lib/filesystem/src/Flow/Filesystem/Local/MemoryFilesystem.php
maybe? I'm not sure how I feel about Fake Filesystem in ETL package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory doesn't support mv()
method, so it will unfortunately not help :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create a dummy to log action calls, if that copy is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I tried before merging, ended up moving this double to Filesystem package
if ($fileStream->isOpen()) { | ||
$fileStream->close(); | ||
} | ||
|
||
if ($this->saveMode === SaveMode::Overwrite) { | ||
if ($fileStream->path()->partitions()->count()) { | ||
$partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->path() . '/*', $fileStream->path()->options()); | ||
$partitionFilesPatter = new Path($fileStream->path()->parentDirectory()->uri() . '/*', $fileStream->path()->options()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
Resolves: #1750
Added new (in fact, it's a copy of the existing filesystem stream) filesystem with a custom scheme to confirm that the protocol is lost when partitioning is done.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security