8000 [DRAFT] UCS-2 needs to be UTF-16 now by msdemlei · Pull Request #68 · ivoa-std/VOTable · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[DRAFT] UCS-2 needs to be UTF-16 now #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

msdemlei
Copy link
Contributor
@msdemlei msdemlei commented Jun 11, 2025

UCS-2 as such is basically not implemented anywhere any more. It's all UTF-16, and I say we need to acknowledge that.

Regrettably, the variable-length encoding of UTF-16 won't work for us because we need fixed lengths für the strings in VOTable BINARY2. That's why I have a TODO in here.

We could require parsers to read the UTF-16 strings and identify surrogate pairs, but that would be terrible in all ways.

To get out of this fix, we could say that arraysize represents the encoded length rather than the number of unicode codepoints. I think I'd consider that reasonable.

Alternatively, we say "you can't have non-BMP characters in unicodeChar and hence no surrogate pairs. VOTable parsers must fail when they are asked to encode anything outside of the BMP or containing surrogate characters". Hm 💩. For clarity, let me stress that basically all emojis are outside of the BMP.

See also https://wiki.ivoa.net/internal/IVOA/InterOpJune2025Apps/unicode-notes.pdf and bug #69.

msdemlei added 2 commits June 11, 2025 13:13
But that won't work easily as we can no longer reliably compute the
length of such fields, at least not without parsing them.

So, there's a TODO in here.

See also https://wiki.ivoa.net/internal/IVOA/InterOpJune2025Apps/unicode-notes.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0