8000 Determine width of ambiguous characters by language [SF:patches:155] · Issue #637 · chrisjsewell/docutils · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Determine width of ambiguous characters by language [SF:patches:155] #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chrisjsewell opened this issue Aug 9, 2020 · 0 comments
Open

Comments

@chrisjsewell
Copy link
Owner

author: tk0miya
created: 2019-04-13 03:44:18.434000
assigned: None
SF_url: https://sourceforge.net/p/docutils/patches/155

attachments:

At unicode specification, some characters are categories as "ambiguous". Their width are changed in locale. In some locale (mainly east asian), they are treated as having wide-width. (In detail, please refer "EAST ASIAN WIDTH" report http://www.unicode.org/reports/tr11/#Definitions)

This patch changes the width of these "ambiguous" characters in docutils via language setting. And it contains the setting for Japanese.

This allows to parse tables containing ambiguous characters correctly (ex. cyrillic characters and symbols).


commenter: milde
posted: 2019-06-24 15:36:40.843000
title: #155 Determine width of ambiguous characters by language

Thank you for the patch.
I have some suggestions:

  • Do not specify "east_asian_width" for languages where it is the default ('WF'). Fall back to this default for languages that do not override it (or "East Asian" locales that use 'WF').

  • If "ja" (Japanese) is the only language where "ambiguous width characters" should be treated as wide (i.e. where it is likely that the program used to edit rST sources uses a fixed-width font with double-width glyphs for these characters), we may care for this exception in "states.py" or "statemachine.py". If also Chinese and/or Korean languages imply "ambiguous width -> wide", specifying the "east_asiian_width" setting in parsers/rst/languages/.py (or languages/.py?) seems better.

Do we need to care about text parts in a different language or just check the document language (assuming that only one font is used to display the source)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0