8000 Dotted syntax for arrays · Issue #693 · toml-lang/toml · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Dotted syntax for arrays #693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
llacroix opened this issue Jan 3, 2020 · 16 comments
Closed

Dotted syntax for arrays #693

llacroix opened this issue Jan 3, 2020 · 16 comments

Comments

@llacroix
Copy link
llacroix commented Jan 3, 2020

It would be nice to have an official fully qualified attribute syntax. I'm not sure exactly how to explain but the idea is to be able to represent a complete toml file as key = value syntax.

For example:

[a]
b = 2
[a.c]
d = 3

Can be represented as

a.b = 2
a.c.d = 3

Which should be equivalent to this in JSON

{
  "a": {
     "b": 2,
     "c": {
         "d": 3
     }
  }
} 

From what I could read, I didn't find a way to include arrays into the mix so it only works for tables now.

Here to explain a bit more the reason for this request is to be able to interpret a flat key, value store as an actual structure. One example where it would be very useful is docker labels and any kind of system using labels. Labels aren't structured and by conventions things are labelled using a dotted syntax. The nice thing about toml is that you can almost just load the value as is. With such official syntaxis, it could allow toml parsers to iteratively parse inputs.

Allowing something like this:

labels = {
    'a.b': 2,
    'a.c.d': 3,
    'a."some val".value': 4
}
toml = Toml()
for key, value in labels.items():
    toml[key] = value

In this case, the key would be properly evaluated to the example above + {"a": {"some val": {"value": 4}}}

The other side effect is that having an object could be serialized to fully qualified attributes and inserted as labels that can be parsed back into normal toml.

This way stores that can only save key values could still be used to store a complete structures.

@marzer
Copy link
Contributor
marzer commented Jan 3, 2020

I might be missing something here, but it seems as though TOML already has exactly what you're after with the dotted key syntax, e.g. this:

[a]
b = 2
[a.c]
d = 3

and this:

a.b = 2
a.c.d = 3

would both be equally represented as JSON:

{ "a" : { "b" : 2, "c" : { "d" : 3 } } }

(and are semantically the same in TOML too)

@llacroix
Copy link
Author
llacroix commented Jan 3, 2020

@marzer As I said, it works with table/object but with arrays how do you represent

[a]
some_val = 1
[[a.b]]
b = 2
[[a.b]]
b = 3

In a flat format?

My point is mainly that it works is only a side effect of the parser and not intentional as far as I could read from the existing issues.
Making it an official allows the behaviour to be consistent and also make sure a whole datastructure including nested arrays is correctly handled or any other potential extension in the future.

@marzer
Copy link
Contributor
marzer commented Jan 3, 2020
a = { some_val = 1, b = [{ b = 2 }, { b = 3 }] }

or

a.some_val = 1
a.b = [{ b = 2 }, { b = 3 }]

... etc.

The spec says it's designed to map unambiguously to a hash table, so I think it's intentional.

@llacroix
Copy link
Author
llacroix commented Jan 4, 2020

The two examples are hardly a flat format.
As I said in the first comment is to be able to use them to store a structure in a key value store.
If you have for example a key: value store, chances are the value will only be of string type so storing inline an array like that has little value.

A syntax like this would be more useful:

a.some_val = 1
a.b[0].b = 2
a.b[1].b = 3

As it can represent a complete structure where the value remains an "atom".

@eksortso
Copy link
Contributor
eksortso commented Jan 4, 2020

A flat format with nothing but dotted keys = values (or arrays) is appealing in theory. But in practical usage, flat formats would only work if you threw out all tables and arrays.

The files that we have seen so far are mostly TOML-compliant. But array indexes are not valid TOML, and, arguing for a human-usable format, I think they shouldn't ever be valid. I could suggest an incremental syntax, using [] and [+] (or maybe [[]], like this) instead of explicit numbers in brackets. But really, is it worth adding this extra syntax to an ostensibly minimal file format?

If all you want to do is define docker labels using TOML, couldn't you restrict the syntax that you use to what Docker recommends in their docs, which is pretty flat already?

So let me ask the devil's question: Why do you want to do this? Certainly I could be persuaded that it's worthwhile, but not until we see more convincing use cases.

@llacroix
Copy link
Author
llacroix commented Jan 4, 2020

But array indexes are not valid TOML, and, arguing for a human-usable format, I think they shouldn't ever be valid.

Agreed, can't say how it could be done [] and [+] could make sense to mark a new array element. I'm not proposing a syntax thought just saying it would be nice to have one.

Why do you want to do this?

As I said in the original post:

One example where it would be very useful is docker labels and any kind of system using labels. Labels aren't structured and by conventions things are labelled using a dotted syntax. The nice thing about toml is that you can almost just load the value as is. With such official syntaxis, it could allow toml parsers to iteratively parse inputs.

In docker and some other tools there are labels, technically you can write a toml as a value of a label but being able to map key to a particular value in a tree/map can be useful if you want the value only or parse the labels as an actual object. That said, labels can't have duplicates in some ways so a key value store would have to keep indexes or a way to know which array index it has.

In that case having [] and [+] wouldn't work, but having fully qualified indices [1] would allow rebuilding an array, Note that if we started with [2] toml could be able to map [0] and [1]. One other alternative could be to manage thing as array map.. So something like this would be possible a.b.0.b In theory it might be already possible as an array in certain language is simply an object with int keys so doing obj['a']['b'][0].b would be valid. So the parser could interpret int as int directly but a.b."1".b wouldn't create an array.

@eksortso
Copy link
Contributor
eksortso commented Jan 4, 2020

In that case having [] and [+] wouldn't work, but having fully qualified indices [1] would allow rebuilding an array,

Well, to be honest, I didn't explain what I had in mind for [] and [+]. The [] would indicate the current element of a table array, and the [+] would indicate that a new array element is now in scope. Read issue #309, or tl;dr. It's a pretty deep subject.

Note that if we started with [2] toml could be able to map [0] and [1].

There isn't any notion of how arrays get numbered anywhere in the TOML spec. The indexing of arrays is undefined, so whether [2] defines [0] or [1] or anything else cannot be asserted.

Really, indexing arrays is not something that a config writer ought to be concerned with. At the very least, writers need to indicate when new elements of table arrays are introduced, hence my preference for an increment-based indicator, if we pursued this subject further.

One other alternative could be to manage thing as array map.. So something like this would be possible a.b.0.b In theory it might be already possible as an array in certain language is simply an object with int keys so doing obj['a']['b'][0].b would be valid. So the parser could interpret int as int directly but a.b."1".b wouldn't create an array.

That leads to a problem, though, because all the parts of TOML keys are, by definition, strings, including the ones that look like numbers. (This was brought up in #592.) For instance, a key a.b.0.b is identical to a.b."0".b. That's unlikely to change, unless that particular issue is reconsidered.

@pradyunsg
Copy link
Member

What's the point of such a representation? How is it relevant to a configuration file format?

I'm happy to assume that such a format has its uses - why does TOML have to support it though?

@llacroix
Copy link
Author
llacroix commented Jan 7, 2020

@pradyunsg I think I've been saying the same thing at least 3 times now... Here's a more concrete example:

Look at the configuration for traefik:

traefik.http.services.foo.loadbalancer.server.scheme "https"
traefik.http.services.foo.loadbalancer.server.port "8080"
traefik.http.services.foo.loadbalancer.healthcheck.interval "1s"
traefik.http.services.foo.loadbalancer.healthcheck.path "/check"

If we were to convert it to toml it would be basically this toml file

[traefik]
[traefik.http.services.foo.loadbalancer.server]
port = "8080"
scheme = "https"
[traefik.http.services.foo.loadbalancer.healthcheck]
path = "/check"
interval = "1s"

https://docs.traefik.io/routing/providers/docker/#port

This way, with a flat format, it could be easy to format objects keys in a way that they can be set as label name.

The main advantage of such format is that if you can serialize your key like that, then you can also filter on the labels. So if you want any "traefik.http.services.*.loadbalancer.server.port" where value is 8080, then you can easily find something.

Then the opposite is also true, if you want to read the labels as an object if the label names were formatted in a way it can be parsed by toml, it could be possible to reconstruct an object from key, value.

The other thing is that while traefik pretty much got the inspiration from toml to implement their labels, having a standard format for key, value store could incite people to use a common format than having multiple people try to roll their own custom flat format.

But labels are not limited to docker/traefik so I guess there's a lot of benefit out there to support formatting data in a key/store where keeping invidividual data in its own key is more convenient than having to deserialize / deserialize a value completely.

Side effects of such format is that you're not forced to parse it completely, it's also trivial to insert remove values.

For example:

a.b.c = 1
a.b.d = 2
a.d.e = 3

If you you want to remove "b", you only have to scan for "a.b" and remove all those lines, if you want to remove add "a.d.c" you can simply add a new line after "a.d.e".

I'd say it would be a machine friendly format.

@eksortso
Copy link
Contributor
eksortso commented Jan 7, 2020

@llacroix I don't know where you're going with this line of reasoning, because I don't know what you're asking. The label/flat format that you want doesn't require anything other than a limited subset of TOML's features; namely, don't use tables in arrays. That restriction alone gets you a perfect match between fully flat keys in one format and a fully hierarchical structure in another. Both are valid TOML.

On top of that, the operations that you describe could be done by writing out the TOML with just dotted keys, then sorting them alphabetically, in hierarchical order. The end result would be convenient for the applications that you mentioned, but those rules don't need to be a full-fledged part of the TOML standard for you to use.

Let me mention a very similar class of objects: the advanced configurations in Mozilla Firefox. This is what you see when you type "about:config" into Firefox's address bar. Very similar in how the keys are arranged. Values can only be numerical or string or Boolean, although they hack in arrays and subtables by expressing them as strings. Also, you can search through the keys quite readily. Is this any closer to what you have been thinking of? If not, then what are we missing?

@llacroix
Copy link
Author
llacroix commented Jan 7, 2020

Yes, of course it can be done without being part of TOML. (I'm already doing that). But having support for arrays would prevent having to do special cases/hacks like in Mozilla Firefox.

One other example to support array could be to use arbitrary indices...

a.position[x] = 1
a.position[y] = 2
a.color[red] = 123
a.color[green] = 22
a.color[blue] = 100
b[obj].name = "obj 1"
b[obj2].name = "obj 2"

Giving:

{
  'a': {
    'position': [1, 2],
    'color': [123, 22, 10]
  },
  'b': [{'name': 'obj 1'}, {'name': 'obj 2'}]
}

Having arbitrary indices can make it possible to get a key directly by path and elements are added in order the arbitrary keys happen. So you could use numbers/naming to get to the value. Using numbers could be confusing as the order in which they occur may not be the same as the resulting order. The issue I see with this is that there isn't a real way to format a array to arbitrary indices without generating them using some kind of unique identifier. So it's not ideal

If it can be done without rolling my own superset of a toml subset. I'd rather implement a standard format than a custom one. Otherwise, hacking a solution never stopped anyone.

@ChristianSi
Copy link
Contributor

@llacroix Do you really have to mix tables and arrays in such an odd and unreadable way? Why not simply write

a.position.x = 1
a.position.y = 2
a.color.red = 123
a.color.green = 22
a.color.blue = 100
b.obj.name = "obj 1"
b.obj2.name = "obj 2"

That's perfectly supported by TOML's dotted-key syntax and is much more readable and logical too. Sounds like a win-win too me.

@eksortso
Copy link
Contributor
eksortso commented Jan 9, 2020

That's perfectly supported by TOML's dotted-key syntax and is much more readable and logical too. Sounds like a win-win too me.

@ChristianSi That is true enough, but it missed a point. I think that @llacroix wants a way to refer to array elements. The original example given was a little confusing, as far as that goes (using a concept of named tuples out of context).

But I'm interested now in the notion of how to address individual elements of a TOML document. Something for TOML that's akin to XPath for XML.

Even with a simple useful configuration, you could be dealing with elements that are at most three subtables deep. I'd love a way to pick out an element, or an iteratable sequence of elements, that could be considered another part of the standard. Think of using config.get('servers.ports[]') in your favorite programming language instead of config['servers'] and then something['ports'] while iterating over the table array servers.

May work better as a parallel standard, though. And my example deliberately avoided mentioning array indexing. But this sort of thing would allow for a "flat" key structure.

@llacroix
Copy link
Author
8000 llacroix commented Jan 9, 2020

That's perfectly supported by TOML's dotted-key syntax and is much more readable and logical too. Sounds like a win-win too me.

That's only a win-win if you can load the structure from the file directly where you need it and save it back without post processing required. It works when you have the choice but would fail or require a bit of boiler plate to convert table to arrays and arrays to tables when loading a file when the format used by a library you don't control uses arrays instead of tables.

It's essentially the difference between config.get(key) and convert(config.get(key, {}).get(key2, {})[3].get(key3, {})....) and config.set(key, value) and config[key][key2][3][key3] = convert_back(value).

As @eksortso said, it tends to look toward having something similar to xpath.

Interestingly it could be something like this JsonPath and JsonPath Expressions

@pradyunsg
Copy link
Member

TOML is meant to be human editable and your proposal is geared toward making it easier for a line-by-line scanner to compose an mapping. I haven't seen any compelling argument for why TOML should stretch to support such a representation beyond "it would be a machine friendly format" which isn't a strong argument.

Could you explain how such a representation would be useful for TOML's intended use case of being a human editable configuration file format?

If it's not directly intended for configuration files but rather toward unambiguous access, as has been hinted, I welcome folks to work on this as an effort outside of TOML's specification (much like how JSONPath is a separate effort and specification from JSON). I think that'd probably be the best approach for this as well. I genuinely doubt that there's much point in reinventing JSONPath but who knows.

@pradyunsg pradyunsg changed the title Fully Qualified Attribute Syntax or flat format Dotted syntax for arrays Mar 5, 2022
@pradyunsg
Copy link
Member

Closing this since I don't think this is particularly valuable.

@pradyunsg pradyunsg 5318 closed this as completed Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0