Description
I was reading through the documentation on Luigi configuration https://luigi.readthedocs.io/en/stable/configuration.html and noticed that there is inconsistent usage of dashes and underscores in the config params. For example under [core]
there is:
default-scheduler-host
,hdfs-tmp-dir
,parallel-scheduling-processes
, ...log_level
,max_reschedules
,parallel_scheduling
I found #2133 and #2160 which replaced a lot of dash-cases with underscore-style, but not everything. However, when I started digging into this I noticed that it's not as simple as "underscores is recommended, dashes are deprecated".
The luigi core features (e.g. worker.py
and scheduler.py
) use the luigi.task.Config
approach: they make a Config
subclass with a bunch of luigi parameters to hold the configuration. With this approach both the underscore-style and dash-style keys in configuration files are supported. Dash-style entries in your configuration will however cause DeprecationWarnings.
On the other hand, a lot of contrib modules (e.g. hadoop, spark, sqla, ...) use luigi.configuration.get_config().get()
directly which requires the correct spelling of the parameter name, practically dash-style in all cases. If you would put the parameter in underscore-style in your config file, it would not be picked up.
To sum up: officially (core luigi features) undercore-style configs are recommended, but for contrib modules only dash-style works in practice.
This creates a very confusing experience when you are trying to get your config working as expected.