8000 Bring sets in line with language spec and enable by default · Issue #584 · google/starlark-go · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Bring sets in line with language spec and enable by default #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tetromino opened this issue Mar 18, 2025 · 9 comments
Open

Bring sets in line with language spec and enable by default #584

tetromino opened this issue Mar 18, 2025 · 9 comments

Comments

@tetromino
Copy link
Collaborator
tetromino commented Mar 18, 2025

Now that sets are part of the Starlark language spec (see bazelbuild/starlark#290), we should bring starlark-go's implementation in line with the spec and enable them by default.

Specifically:

  • disable even no-op mutation attempts on frozen/iterating sets by add() and discard()
  • disable comparison operator for sets (or make it flag-guarded?)
  • require both sides of |, &, -, and ^ operators (and their augmented forms) to be sets if lhs is a set
  • add multi-argument form of union(), intersection(), difference(), update()
  • add isdisjoint(), intersection_update(), difference_update(), symmetric_difference_update()

@aranguyen @brandjon FYI

@adonovan
Copy link
Collaborator

I agree, and have been meaning to do this (though would welcome a CL). However, I'm worried that it might be a breaking change for some users, so I propose that we use the existing -set flag to mean "legacy set behavior" and the absence of that flag to mean "standard set behavior". Then we can deprecate the flag.

@tetromino
Copy link
Collaborator Author

I'd suggest flag-guarding the differences in operator behavior (comparison, |, etc.), but not the differences in behavior for no-op mutation attempts on frozen sets (which seems like an edge case).

@adonovan
Copy link
Collaborator

I agree the frozen thing is an edge case we can ignore.

After further reflection, it's not clear to me how to implement flag guarding for the relevant operators since the flag is conceptually a compile-time (and thus per-file) setting, not a per data structure setting nor a per thread setting. Operators don't have access to the files from which they came (nor the thread that calls them, for that matter); also, binary operators and functions have to make a choice about what to do in the (admittedly unlikely) event that exactly one but not both of their operands is covered by the flag semantics. (The interesting case of x|y is when only the left operand is a set, perhaps finessing that question.)

I'm not sure we can do much more than emit a dynamic warning message: the first time a deprecated operation is called we log a message.

@brandjon
Copy link
Collaborator

In Bazel this is gated with BuildLanguageOptions / StarlarkSemantics, which is more or less global to the invocation. Why can't the Go interpreter ac 8000 cess this as global (per-interpreter) state? Or in the worst case, truly global (per-binary) state? That would be enough to guard against backsliding after an LSC within Google. Not sure what the backward compatibility concerns are for other applications.

If you really do need to make it per-file or per-data structure, what about creating an internal second variant of the set datatype and having the flag gate which version you get via set-producing operations (in particular, the set() builtin)?

@adonovan
Copy link
Collaborator
adonovan commented Mar 19, 2025

In Bazel this is gated with BuildLanguageOptions / StarlarkSemantics, which is more or less global to the invocation. Why can't the Go interpreter access this as global (per-interpreter) state?

The Go implementation of BinaryOp (for example) doesn't accept a StarlarkSemantics, but if it had done so, then it would be a reasonable way to implement the desired behavior: of course, the behavior of the operator would be determined by the Semantics of the thread executing the operation, if any, or by an explicit semantics if called outside any thread.

(Grumbly aside: at one point I succeeded in making the Java implementation of binaryOp not depend on StarlarkThread, but unfortunately it seems to have since backslid so that operations can no longer be evaluated outside the context of a thread.)

Or in the worst case, truly global (per-binary) state?

The Go implementation used global variables in the past but--as always happens eventually--we regretted it and switched to explicit passing of a FileOptions parameter. (The legacy code continues to work by immediately creating a FileOptions from a snapshot of the global variables' current state.)

If you really do need to make it per-file or per-data structure, what about creating an internal second variant of the set datatype and having the flag gate which version you get via set-producing operations (in particular, the set() builtin)?

Unfortunately the Set data type is part of the public API, so it would be disruptive to create two different flavors Set, (the legacy one) and NewSet. One could represent the two flavors as a boolean inside Set. The question is: how does it get its value? When the set is constructed by a set() or x.union(y)call in a file, clearly the answer is "it comes from the file's options", i.e. reflection one level up the call stack. But what about the set returned by set ⊕ set for various binary operators: does it come from the left or the right operand? Or a combination (AND or OR) of both? There are various pros and cons. Perhaps I'm overthinking it. One could certainly warn if old and new sets are mingled in a single application.

@brandjon
Copy link
Collaborator

Most Starlark language changes in google3 have not required interoperability between the old and new semantics within the same invocation. The changes in question here seem minor enough that we should be able to migrate offending code within google3, and flip a flag that applies globally to the interpreter. I would expect backsliding to be unlikely and addressable with forward-fixes.

But this might not be possible via hacking in a global variable if, for instance, there are multiple usages of the interpreter embedded into the same binary that correspond to separate states of the code base. I don't know whether that's the case for any users either inside or outside of Google.

(Grumbly aside: at one point I succeeded in making the Java implementation of binaryOp not depend on StarlarkThread, but unfortunately it seems to have since backslid so that operations can no longer be evaluated outside the context of a thread.)

Looking back at the code history, this seems to be support a toolchain-related API being able to resolve repository labels correctly -- the label is given as the key in the index expression, and the repo mapping information is obtained from the application data stored in the thread. I guess the alternative would've been to bake a reference to this information into the toolchain-api object at its construction.

There's a number of application-defined operations in Bazel that require a thread for evaluation, label resolution being chief among them. I think the difference here is whether those operations are just function calls, or extend to overloaded operators too. I don't know that we lose much by, say, not being able to guarantee that evaluation of a[b] is possible in all isolated environments, where a is a given arbitrary non-trivial object. That's already the case for a(b).

@adonovan
Copy link
Collaborator

Most Starlark language changes in google3 have not required interoperability between the old and new semantics within the same invocation. The changes in question here seem minor enough that we should be able to migrate offending code within google3, and flip a flag that applies globally to the interpreter. I would expect backsliding to be unlikely and addressable with forward-fixes.
But this might not be possible via hacking in a global variable if, for instance, there are multiple usages of the interpreter embedded into the same binary that correspond to separate states of the code base. I don't know whether that's the case for any users either inside or outside of Google.

Multiple independent uses of the Go-based interpreter within a single executable are indeed likely to be rare. Given the API constraints (which don't plumb a "semantics" everywhere) I don't think we're going to be able to provide completely smooth transitions. A global variable may be the best we can do.

@brandjon
Copy link
Collaborator

The sets feature was always considered experimental for Starlark-Go, no? Wouldn't the global var only live in the codebase temporarily?

@adonovan
Copy link
Collaborator

The sets feature was always considered experimental for Starlark-Go, no? Wouldn't the global var only live in the codebase temporarily?

It was non-standard, and behind a flag, but Starlark/Go, like well behaved Go modules in general, has no such concept of experimental features: if an API is published, removing it is a breaking change, and we don't make breaking changes without changing the major version number of the module, which is the same as renaming the module.

tetromino added a commit to tetromino/starlark-go that referenced this issue Mar 26, 2025
…forms

Also update the docs and tests.

Working towards google#584
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0