8000 ATTACH OR REPLACE database to allow swapping of new data. by xevix · Pull Request #15355 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ATTACH OR REPLACE database to allow swapping of new data. #15355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 8, 2025

Conversation

xevix
Copy link
@xevix xevix commented Dec 16, 2024

Addresses #15208

Adds support for replacing existing attached database.

This is just a proposal for more concrete discussion, but no worries at all if this doesn't fit with the vision for ATTACH functionality in DuckDB. This is my first PR to DuckDB so apologies ahead of time if I missed something critical.

Example usage:

-- Attach as usual
ATTACH 'taxi_v1.duckdb' AS taxi;
-- New file comes in, attach new file which automatically detaches the old one
ATTACH OR REPLACE 'taxi_v2.duckdb' AS taxi;

Application query:

-- Query can stay the same, without needing to update the DB name, nor deal with DETACH etc.
SELECT DISTINCT vendor_name FROM taxi.taxi_data;

@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 19, 2024 23:04
@xevix xevix marked this pull request as ready for review December 20, 2024 18:48
@Mytherin
Copy link
Collaborator
Mytherin commented Jan 2, 2025

Thanks for the PR! Looks good to me - good idea.

@carlopi
Copy link
Contributor
carlopi commented Jan 2, 2025

Cool idea!

I looked at this trying to find problems with the fact that dependencies are left hanging, but DuckDB seems to handle that fine (errors do make sense), and given this is opt-in it's likely OK.

One bit that is not working as it should is:

ATTACH OR REPLACE 'name.db' as db;
--- works
ATTACH 'sqlite:name.db' as db2;
--- fails on load of sqlite_scanner, see below
HTTP Error:
Failed to download extension "sqlite_scanner" at URL "http://extensions.duckdb.org/ad9ab5f550/osx_arm64/sqlite_scanner.duckdb_extension.gz" (HTTP 403)
Extension "sqlite_scanner" is an existing extension.

Are you using a development build? In this case, extensions might not (yet) be uploaded.
ATTACH OR REPLACE 'name.db' as db;
--- works
ATTACH OR REPLACE 'sqlite:name.db' as db;
--- works

I think ATTACH OR REPLACE should either manage to attempt install & loading of sqlite_scanner or in any case some refactoring of the ATTACH checks should be shared so an error can be here thrown (like ATTACH OR REPLACE do not support extensions-provided databases).

There might be other corner cases in this area, or code might be added in the future, so maybe it's best to share more code between ATTACH OR REPLACE and simple ATTACH.

EDIT: actually the problem is only given the name is the same, and so it get skipped due to the early check on the name. Unsure if including the type should make this work there.

Copy link
Collaborator
@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carlopi for the investigation, could you resolve the issue he found? Otherwise can you re-generate the grammar to fix the merge conflict? Then I think this is good to go.

return SourceResultType::FINISHED;
if (info->on_conflict == OnCreateConflict::REPLACE_ON_CONFLICT) {
// same path, same name, DB does not need replacing
if (existing_db->GetCatalog().GetDBPath() == path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As found by @carlopi this needs to check GetCatalogType as well

@duckdb-draftbot duckdb-draftbot marked this pull request as draft January 3, 2025 18:13
@xevix xevix force-pushed the attach-or-replace branch from db33648 to 6b1d755 Compare January 3, 2025 18:14
@xevix
Copy link
Author
xevix commented Jan 3, 2025

Thanks for the review!

Great catch on the types, I'll add a check to ensure we only skip reattach if it also matches the type.

I feel like same path/name but different type should be an explicit error, since if a DuckDB/SQLite file is attached at a given path and in active use, it's hard to think of a valid use case where it would be externally overwritten as a different type. For now I've left it as is, but please let me know if it makes sense to throw a BinderError instead, since it could highlight user error of trying to reattach with a different type by mistake.

I may have found a separate bug. It seems if you ATTACH a duckdb file but claim that it's an sqlite file, the ATTACH will succeed, but this will lead to issues later. I might file this one separately once confirmed.

$ duckdb
v1.1.3 19864453f7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D ATTACH 'duck.db' AS db;
D CREATE TABLE db.foo (s TEXT);
D DETACH db;
D ATTACH 'sqlite:duck.db' AS db;
D .tables
Invalid Error: Failed to prepare query "SELECT name FROM sqlite_master WHERE type='table'": file is not a database

@carlopi
Copy link
Contributor
carlopi commented Jan 3, 2025

Sqlite was an example that do not have a lot of practical sense given a file can't be both formats at the same time, but extensions can potentially override/change behaviour in various ways, so given we support that I though it was worth supporting it also here.

Idea is that file.db and extension_x:file.db can have very different meaning (even just the fact that the extension will be loaded).

The fact that sqlite:file.db do not independently checks for the file to be a valid sqlite file I think can be possibly improved on, but not critical, given one could argue user gave indication on the file format.
Note that default behaviour is that specifying a type explicitly is not needed if sqlite or duckdb, since it will be auto-detected.

@xevix
Copy link
Author
xevix commented Jan 3, 2025

@carlopi Ah ok, that makes sense to me. I'll leave it as-is then, thanks for the explanation.

Should I update the ATTACH documentation to reflect the new syntax, or is it better to wait for review/tests to complete first?

@carlopi
Copy link
Contributor
carlopi commented Jan 3, 2025

@xevix: sending a PR to duckdb/duckdb-web making clear in the PR message that it's connected to a 1.2 feature would be great.

If PR is ready, can you undraft the PR? (we draft them automatically on changes)

@xevix xevix marked this pull request as ready for review January 3, 2025 21:40
@xevix
Copy link
Author
xevix commented Jan 10, 2025

@carlopi I think things should be good to go, but please let me know if there’s anything else I should do. Good luck with the 1.2 release 🎉

@xevix
Copy link
Author
xevix commented Feb 5, 2025

Hey all, I retested the PR merging in latest main so it should still be good to go. Might have to roll this documentation back until then duckdb/duckdb-web#4467.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 6, 2025 06:50
@Mytherin Mytherin marked this pull request as ready for review February 6, 2025 06:52
@xevix
Copy link
Author
xevix commented Feb 6, 2025

Thanks for having a look. I tested building reldebug from the main branch and see the same behavior failing to have sqlite installed/loaded on attach.

[main][~/external/duckdb]$ ./build/reldebug/duckdb
v1.2.1-dev63 a35a3fba95
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D ATTACH 'sqlite:foo.db';
IO Error:
Extension "/Users/xevix/.duckdb/extensions/a35a3fba95/osx_arm64/sqlite_scanner.duckdb_extension" not found.
Extension "sqlite_scanner" is an existing extension.

Install it first using "INSTALL sqlite_scanner".

Using the require in the test was the main way I found to load it in, is there a better way perhaps?

@Mytherin
Copy link
Collaborator
Mytherin commented Feb 6, 2025

Can we not test this without the SQLite scanner? Adding a test depending on the SQLite scanner to the main repo means the test will not be triggered in most of the CI here.

@xevix
Copy link
Author
xevix commented Feb 6, 2025

I think so, if there's another type of file-based DB that can be ATTACHed at the same path without needing to require it, that would work. Btw httpfs seems to be required in this other test, is the concern specifically with the sqlite scanner being required, or general core extensions?

I can also remove this line from the test, it was mainly to address @carlopi 's concern above. Please let me know and I'll remove it.

@Mytherin
Copy link
Collaborator
Mytherin commented Feb 6, 2025

httpfs is a bit of a special case - there are CI runs that specifically run httpfs test cases that are part of the DuckDB test suite. This doesn't apply to other out-of-tree extensions.

I'm fine with removing the line from the test.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft February 6, 2025 21:13
@xevix xevix marked this pull request as ready for review February 6, 2025 21:14
@xevix xevix requested a review from Mytherin February 7, 2025 18:58
@Mytherin Mytherin merged commit 8a352a7 into duckdb:main Feb 8, 2025
50 checks passed
@Mytherin
Copy link
Collaborator
Mytherin commented Feb 8, 2025

Thanks!

Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
ATTACH OR REPLACE database to allow swapping of new data. (duckdb/duckdb#15355)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0