8000 SPICE-0012: URL standard library by bioball · Pull Request #13 · apple/pkl-evolution · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SPICE-0012: URL standard library #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

bioball
Copy link
Member
@bioball bioball commented Jan 30, 2025

No description provided.

@StefMa
Copy link
StefMa commented Jan 30, 2025

It seems that "search params" are the correct technical wording. However I think if we would go with just Params it would be better understandable... 🤔 Maybe at least introducing an typealias for this? 🤔

Or are search params different to queries?

Just some random thoughts 🤷‍♂️ I've never seen SearchParams in Java/Kotlin yet (as an Android developer). Maybe others have more use for that wording...

Copy link
Contributor
@stackoverflow stackoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Like the design. Left some questions in the comments.

/// The username component.
///
/// If the URL does not require a username, set to the empty string.
username: AsciiString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to make the empty string the "null" value instead of proper null like port, path, etc.?
I'd rather go with AsciiString?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. According to the spec, this field (and several other ones) do not admit null:

https://url.spec.whatwg.org/#url-representation

There are some fields that can be null, but this is not one of them.

Practically, I don't know if this makes any difference.

/// The password component.
///
/// If the URL does not require a password, set to the empty string.
password: AsciiString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

/// Parses [source] into a URL.
///
/// Throws if [source] is an invalid URL.
external function parse(source: String): Url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with having one version that throws on failure, for when users know their URL is correct. But we should have parseOrNull(source: String): Url? for when you want to recover from errors.

Copy link
Member Author
@bioball bioball Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other parsers don't have a parseOrNull, so, this follows the same precedent (see yaml.Parser and json.Parser).

If you need to recover from errors, you can use test.catch

Copy link
Member Author
@bioball bioball Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting a little bit more on this, I'm convinced that we should just have a parseOrNull here, updated.

The rationale is: we also have similar "OrNull" methods in other places. It makes it much easier for users; e.g. Uri.parseOrNull(input) ?? Uri.parse("https://example.com")

We should have the YAML and JSON parsers conform to this as a future task.

else null

/// Creates a [SearchParams] from the given form encoded string.
const function SearchParams(input: String): SearchParams = // etc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible input is not a valid query string? If so, what happens then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question; I don't think so.

Here is the parsing algorithm: https://url.spec.whatwg.org/#urlencoded-parsing

None of these steps involve failure.

/// ```
/// encode("https://example.com/some path/") == "https://example.com/some%20path"
/// ```
const external function encode(value: String): String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const external function encode(value: String): String
const external function encodeUri(value: String): String

encode is a bit too generic IMO, and it was hard to understand the difference between it and percentEncode at first glance.
Could also be called encodeForUri, encodeUriString, etc.

Copy link
Member Author
@bioball bioball Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module's name is already Url, so, usage looks like Url.encode. I think Url.encodeUri would be a little too wordy.

By the way, one of the suggestions of this SPICE is to prefer "URL" over "URI".

----
import "pkl:Url"

myUrl: Url = new { // <1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse will throw (or return null) on an invalid URL. What happens when you create an invalid URL with new? Will it throw?
We could have a UrlBuilder that allows you to create a URL with new and call toUrl(orNull) to validate it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ensure that you cannot construct invalid Urls through property types and constraints.

@bioball
Copy link
Member Author
bioball commented Jan 30, 2025

@StefMa: this API is taken from the URL spec, see:

https://url.spec.whatwg.org/#urlsearchparams

By the way, Java's stdlib doesn't have a library for parsing the query. However, some other languages do, and call them different things:

Technically, the query string portion of a URL is just a string (no structure). Representing structured data using & and = characters is URL form encoding.

@KushalP
Copy link
KushalP commented Jan 30, 2025

Documenting something here that we spoke about: will merging this into main require codegen implementations to exist as well?

@HT154
Copy link
Contributor
HT154 commented Jan 30, 2025

From an ergonomic standpoint, I would hope that Pkl's Url type produces language/stdlib-native URL types in each target language. Pretty sure would require changes in both the code generators and the Unmarshal/Decoder implementations.

@bioball
Copy link
Member Author
bioball commented Jan 31, 2025

Good point; I need to fill in some details about language bindings. And, I agree, it should probably turn into the target language's URL type.

@HT154
Copy link
Contributor
HT154 commented Jan 31, 2025

I wonder if it makes sense to expand read to accept Url values as well. Many of the places I'm using the experimental URI library today pass the "rendered" URI directly to read. This obviously isn't a huge impact, but might be a nice little bit of sugar.

bioball and others added 3 commits January 31, 2025 09:49
Co-authored-by: Islon Scherer <islonscherer@gmail.com>
Co-authored-by: Islon Scherer <islonscherer@gmail.com>
* Add `parseOrNull` to the parser
* Add module-level `parse` and `parseOrNull`
* Add section describing language bindings, and binary format
* Add `value` property
@bioball
Copy link
Member Author
bioball commented Jan 31, 2025

I wonder if it makes sense to expand read to accept Url values as well. Many of the places I'm using the experimental URI library today pass the "rendered" URI directly to read. This obviously isn't a huge impact, but might be a nice little bit of sugar.

We possibly want to make read() also only accept string literals in the future (where exsiting dynamic reads would be replaced with a "run" mode of some sort). Without this, we can't statically analyze a module and determine all of its dependencies. So, with that in mind, I would rather keep read() in its current form.

.pkl.Url
[source,pkl]
----
module pkl.Url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a URI like env:USER be represented by this module?

In go, this is parsed as

url.URL{
  Scheme: "env",
  Opaque: "USER",
}

F438

Copy link
Member Author
@bioball bioball Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be parsed as new URL { scheme = "env"; path = "USER" }.

See https://url.spec.whatwg.org/#example-url-components

Actually, I wonder how we can better support opaque URLs. The env URL env://foo/bar represents an env var with value //foo/bar, and doesn't mean "the host is foo and the path is /bar".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to proffer Go's URL implementation as a carrot and Swift's as the stick:

Go

Go's url.Parse function does (roughly) this:

  1. Split off the fragment, if present, and decode/store it
  2. Split off the scheme, if present
  3. Split off the query, if present, and store it
  4. If a scheme is present and the character immediately following the : is not /, store the remainder in Opaque and return
  5. Otherwise, proceed to parse the authority and path

url.URL.String produces a string from a URL struct. When Opaque is non-empty, the result has form scheme:opaque?query#fragment, otherwise it's scheme://userinfo@host/path?query#fragment.

In practice, I've found this API to be extremely usable and flexible enough for working with the varied URIs found in the Pkl ecosystem. This may be a good design to reference that departs somewhat from the literal spec in the name of usability.

Swift

Having worked with pkl-swift, I've found Foundation URL type's lack of similar support makes working with opaque URIs like env:USER significantly less ergonomic. Parsing just the "USER" out requires error-prone mangling of the full URL string, eg. https://github.com/apple/pkl-swift/blob/c280a0e6324097c429bfc19a2cfe5761ab12bbe9/docs/modules/ROOT/pages/external-readers.adoc?plain=1#L33) to get the equivalent of Go's url.URL.Opaque.

Pkl

Given that opaque URIs are already commonplace in the Pkl ecosystem (and used for several of the runtime's built-in resources), I think having first-class support for them in this proposal is necessary. A similar model to Go's could be adopted, but with added type constraints so ensure that opaque is never non-null at the same time as hostname/port/path`.

A case like env://foo/bar (or env:/foo/bar) is still tricky (and go parses them "wrong"). I wonder if it would be reasonable to do something like this:

opaque: String = "<rendered authority/path>"

Url.toString() would then use this field and ignore the authority/path field. This would allow direct construction of the Url to directly set opaque while still maintaining the same behavior as non-opaque URLs. Similarly, during parsing, if an authority/path are found (same criteria as Go) then those fields are populated, otherwise opaque is set directly.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually need to do wrangling too with Go's url API for opaque URLs. For example, to turn a URL like env://foo%20/bar into //foo /bar, you'd need:

func getSchemeSpecificPart(u *url.URL) (string, error) {
  return url.PathUnescape(strings.Split(u.String(), ":")[1])
}

Whether a URL is opaque or not is really up to the scheme, and a simple rule like "if the char after the colon is /, it's opaque" doesn't feel good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0