8000 GBFS ID type definition · Issue #541 · MobilityData/gbfs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

GBFS ID type definition #541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
tdelmas opened this issue Sep 13, 2023 · 3 comments · Fixed by #545
Closed
1 task done

GBFS ID type definition #541

tdelmas opened this issue Sep 13, 2023 · 3 comments · Fixed by #545
Labels
Moved to PR proposal:breaking v3.0-RC2 Candidate change for GBFS 3.0 (Major release) - 2nd pass

Comments

@tdelmas
Copy link
Contributor
tdelmas commented Sep 13, 2023

What is the issue and why is it an issue?

The current definition of ID in https://github.com/MobilityData/gbfs/blob/master/gbfs.md#field-types is:

  • "a string" that "MUST NOT contain spaces"

This definition is the same since the first draft in 2016 #5 :

must not contain spaces

There are multiple problems with that definition:

  • A precision problem: What is a space? It's not defined here, and as any character that JSON can encode is valid, the definition is not straightforward:

  • A purpose problem: Why spaces are forbidden? What was the goal?

  • The fact that allowing any characters (except spaces) may cause compatibility problems:

    • Extended Unicode characters needs to be represented differently in JSON, with two escaped character \uXXXX\uXXXX
    • Two different code point sequences may represent the same thing depending on the encoding path, and may need normalization to compare them (https://en.wikipedia.org/wiki/Unicode_equivalence)
    • Non-ASCII characters may cause compatibility problem during transport and storage
    • Manual debugging for producers and integrators may be harder (hard to read, write, copy without errors)

Current usage of IDs (based on system.cvs)

Of the 826 system declared, I've analyzed 798 (others are dead or broken)

IDs from free_bike_status, station_status, vehicle_types and system_pricing_plans were analyzed.

The IDs of those systems are composed of:

  • . (dot) for 1 system (sharedmobility.ch)
  • @ (AT) for HelBiz 2 systems ("bike_id" like "P8S0C2149C0011@SD02" in https://gbfs.helbiz.com/v2.2/miamilakes/free_bike_status.json)
  • (space !!!) for 7 systems (mostly in vehicles types)
    • 5 publicbikesystem.net in vehicle_type (ex. : "vehicle_type_id": "E Scooter")
    • openov.nl in station_information (one station has the id "GR002 ", which is probably an error as all other station follow the same pattern without space in https://gbfs.openov.nl/ovfiets/station_information.json)
    • sharedmobility.ch in plan_id such as emobility:83106 return" and emobility:2306 oneway
  • : for 34 systems (sharedmobility.ch and entur)
  • All other systems only use 0-9, a-z, A-Z, - and _

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

The specification should clearly state what constitute a valid ID. Some solutions are presented below:

  • Any Unicode character
    • Advantages: least restrictive
    • Drawbacks: comparison, encoding and compatibility issues
  • Any Unicode character expects a specific list (Spaces? Non-printable characters?)
    • Advantages: most compatible with the current specification
    • Drawbacks: comparison, encoding and compatibility issues
  • Any printable ASCII character (https://en.wikipedia.org/wiki/ASCII#Printable_characters)
    • Advantages: high compatibility
    • Drawbacks: a few escaping issues, restrictive compared to the current specification
  • A restricted list of printable ASCII character
    • Advantages: high compatibility
    • Drawbacks: a few escaping issues, restrictive compared to the current specification
  • [A-Z][a-z][0-9]_-.:@
    • Advantages: very high compatibility (791/798 : 99.1%, not 100% because it excludes the 7 that are using spaces)
    • Drawbacks: restrictive compared to the current specification
  • [A-Z][a-z][0-9]_-: (789/798 : 98.8%)
    • Advantages: high compatibility, no encoding or escaping issue
    • Drawbacks: restrictive compared to the current specification
  • [A-Z][a-z][0-9]_- (756/798: 94.7%)
    • Advantages: high compatibility, no encoding or escaping issue
    • Drawbacks: restrictive compared to the current specification

As 3.0 is a major breaking version, it is the perfect moment to restrict the character range of IDs.
Restricting them to the strictest proposition [A-Z][a-z][0-9]_- seams to be a reasonable choice:

  • It makes it simpler to store, transmit and compare IDs without any compatibility problem
  • They could be used in URIs without any issues (both web and Apps)
  • Systems with non-compatible IDs could simply encoded them in Base64url to be compatible
    A least restrictive choice, regarding current usage, could be [A-Z][a-z][0-9]_-.:@ or something in between. (: and @ are mostly safe in most situation, but may be interpreted differently in URIs).

Comparisons with IDs in other systems:

ID - [...] is a sequence of any UTF-8 characters. Using only printable ASCII characters is recommended. [...]

A Feature object's "id" member is a string or number

Technically, the value for an id attribute may contain any character, except whitespace characters. However, to avoid inadvertent errors, only ASCII letters, digits, '_', and '-' should be used, and the value for an id attribute should start with a letter.

References:

All Unicode characters may be placed within the
quotation marks, except for the characters that MUST be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

Is your potential solution a breaking change?

  • Yes
@testower
Copy link
Contributor

Just to explain Entur's usage of :. We have adopted the transmodel/netex ID convention for data separation purposes with
the format [codespace]:[type]:[identification]. We would favor a solution that continues to allow : in IDs for this reason.

See https://enturas.atlassian.net/wiki/spaces/PUBLIC/pages/728563782/General+information+NeTEx#Definitions and https://enturas.atlassian.net/wiki/spaces/PUBLIC/pages/1883439205/Mobility+Data+Collection+-+GBFS+v2.2-v2.3#IDs

@richfab richfab added proposal:breaking v3.0-RC2 Candidate change for GBFS 3.0 (Major release) - 2nd pass labels Sep 15, 2023
@richfab
Copy link
Contributor
richfab commented Sep 15, 2023

Thank you @tdelmas for the thorough analysis of the known feeds and for the detailed suggestions!
I am curious to know what other GBFS producers and consumers think about this. If anyone has a preferred solution, feel free to participate in the discussion!

tdelmas added a commit to tdelmas/gbfs that referenced this issue Sep 20, 2023
This PR clarify and restrict the characters allowed in IDs. See MobilityData#541 for additional details.

Those edits have multiple goals:
- Clarify the current specification that is too vague
- Ensure interoperability, it MUST be possible to store and compare IDs simply
- Ensure that IDs are easy to manipulate by humans, because even if the format is design for machines, IDs are often used by humans to debug, so they SHOULD be easy to read and write, regardless of the keyboard layout or system used.

With that specification, all IDs present in `system.cvs` should stay compliant (excepts those using spaces `0x20`, who are already not compliant with the current specification).

Also, all other shared mobility systems that I have seen (GBFS or not) are compatible with both restrictions (again, except for spaces in a case that looks like an error).

Does anyone think that IDs should be restricted further, such as `A-Za-z0-9_-:`? (And consider that existing system that are using `.@/` are compliant because "SHOULD" means "that there may exist valid reasons in particular circumstances to ignore a particular item", and at least "legacy" may fit into that.   

Fix MobilityData#541
@josee-sabourin
Copy link
Contributor

Closing this thread since a PR has been opened for this at #545, please continue the discussion there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moved to PR proposal:breaking v3.0-RC2 Candidate change for GBFS 3.0 (Major release) - 2nd pass
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0