fix: Taps running in parallel use a different temporary catalog file #8794

edgarrmondragon · 2024-09-23T19:37:16Z

Description

Related Issues

Closes feature: Catalog extra still puts a properties.json file in .meltano #8763

netlify · 2024-09-23T19:37:34Z

✅ Deploy Preview for meltano canceled.

Name	Link
🔨 Latest commit	`7e27411`
🔍 Latest deploy log	https://app.netlify.com/sites/meltano/deploys/66f1c36fe7b12f00089fb43a

dluo-sig · 2024-09-24T23:27:09Z

src/meltano/core/plugin/singer/tap.py

@@ -233,7 +233,7 @@ def config_files(self):  # noqa: ANN201
        """Get the configuration files for this tap."""
        return {
            "config": f"tap.{self.instance_uuid}.config.json",
-            "catalog": "tap.properties.json",
+            "catalog": f"tap.{self.instance_uuid}.properties.json",
            "catalog_cache_key": "tap.properties.cache_key",
            "state": "state.json",


I was checking up on the other issue and was curious about the fix. Similar but unrelated, I'm wonder 8000 ing if state.json here also needs a fix? It seems to be the same state file that is stored in azure when using different state backend. However, the difference I've noticed is that on azure, it actually gets split out based on --state-id-suffix provided to meltano run. I suppose something similar would happen if running meltano el with --state-id set. But locally, it also maintains this state but just in a fixed location. Is it also intended for it to still maintain this local state file when using different state backend?

Yeah, I don't know if we should keep the local state.json after the pipeline finishes 🤔.

FWIW I don't think this PR is currently on the right path. A more correct approach may be to have a dedicated directory per run. That way, plugins could run in parallel without clashing with each other's files:

tap-example --config /path/to/run-1/tap.config.json --state /path/to/run-1/state.json --catalog /path/to/run-1/tap.properties.json tap-example --config /path/to/run-2/tap.config.json --state /path/to/run-2/state.json --catalog /path/to/run-2/tap.properties.json

During the preparation phase, we'd

dump resolved config to /path/to/run-1/tap.config.json

get /path/to/run-1/state.json from the state backend

copy /path/to/run-1/tap.properties.json from cache, or regenerate it

Agreed, I did mention that as a possibility as well, but I'm not familiar with what's involved to do all of that. The idea is that it should use the run uuid to create a unique folder to use. You theoretically would get the same behavior if all of the files had the uuid as well, just a matter of what the desired folder structure is. I believe the local state does get removed once the run is done -- I just wasn't sure if the local state is needed if it's kept in a different backend anyway, but I guess it does need to be there in order to upload? In that case, maybe the folder uuid is still ultimately better because that way, the state file wouldn't need to be renamed. This does seem to get complicated, though, because you would want to reuse the same state somehow across runs.

Maybe it should read the state from whatever the state backend is, regardless of whether that is from cloud or local, in a central location without the uuid. Then, the execution will create the state in a uuid folder, and finally update the state backend after completion? I'm also not sure how state gets merged if there are multiple processes that end up writing to the same file in the state backend that started in parallel. For us, we maintain a separate state file per table using --state-id-suffix, so it's not really an issue.

Or I might just be overthinking this whole thing.

Yeah, though the failing tests point to an assumption I'm violating here 🤔

s7clarke10 · 2024-10-08T22:07:29Z

I believe we need to handle the meltano run command with a --state-id-suffix or use a meltano el command with a --state-id to provide uniqueness regardless of the command which is issued.

I am not sure whether the targeted changes are just for the meltano run command or whether it will help with the meltano el commands as well.

fix: Taps running in parallel use a different temporary catalog file

7e27411

edgarrmondragon linked an issue Sep 23, 2024 that may be closed by this pull request

feature: Catalog extra still puts a properties.json file in .meltano #8763

Open

dluo-sig reviewed Sep 24, 2024

View reviewed changes

edgarrmondragon self-assigned this Nov 8, 2024

edgarrmondragon mentioned this pull request Nov 14, 2024

feature: Use standard platform locations for cache, logs, etc. #8895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Taps running in parallel use a different temporary catalog file #8794

fix: Taps running in parallel use a different temporary catalog file #8794

fix: Taps running in parallel use a different temporary catalog file #8794

Are you sure you want to change the base?

fix: Taps running in parallel use a different temporary catalog file #8794

Conversation

Description

Related Issues

✅ Deploy Preview for meltano canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment