Determine width of ambiguous characters by language [SF:patches:155] · Issue #637 · chrisjsewell/docutils · GitHub
More Web Proxy on the site http://driver.im/
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At unicode specification, some characters are categories as "ambiguous". Their width are changed in locale. In some locale (mainly east asian), they are treated as having wide-width. (In detail, please refer "EAST ASIAN WIDTH" report http://www.unicode.org/reports/tr11/#Definitions)
This patch changes the width of these "ambiguous" characters in docutils via language setting. And it contains the setting for Japanese.
This allows to parse tables containing ambiguous characters correctly (ex. cyrillic characters and symbols).
commenter: milde
posted: 2019-06-24 15:36:40.843000
title: #155 Determine width of ambiguous characters by language
Thank you for the patch.
I have some suggestions:
Do not specify "east_asian_width" for languages where it is the default ('WF'). Fall back to this default for languages that do not override it (or "East Asian" locales that use 'WF').
If "ja" (Japanese) is the only language where "ambiguous width characters" should be treated as wide (i.e. where it is likely that the program used to edit rST sources uses a fixed-width font with double-width glyphs for these characters), we may care for this exception in "states.py" or "statemachine.py". If also Chinese and/or Korean languages imply "ambiguous width -> wide", specifying the "east_asiian_width" setting in parsers/rst/languages/.py (or languages/.py?) seems better.
Do we need to care about text parts in a different language or just check the document language (assuming that only one font is used to display the source)?
The text was updated successfully, but these errors were encountered:
author: tk0miya
created: 2019-04-13 03:44:18.434000
assigned: None
SF_url: https://sourceforge.net/p/docutils/patches/155
attachments:
At unicode specification, some characters are categories as "ambiguous". Their width are changed in locale. In some locale (mainly east asian), they are treated as having wide-width. (In detail, please refer "EAST ASIAN WIDTH" report http://www.unicode.org/reports/tr11/#Definitions)
This patch changes the width of these "ambiguous" characters in docutils via language setting. And it contains the setting for Japanese.
This allows to parse tables containing ambiguous characters correctly (ex. cyrillic characters and symbols).
commenter: milde
posted: 2019-06-24 15:36:40.843000
title: #155 Determine width of ambiguous characters by language
Thank you for the patch.
I have some suggestions:
Do not specify "east_asian_width" for languages where it is the default ('WF'). Fall back to this default for languages that do not override it (or "East Asian" locales that use 'WF').
If "ja" (Japanese) is the only language where "ambiguous width characters" should be treated as wide (i.e. where it is likely that the program used to edit rST sources uses a fixed-width font with double-width glyphs for these characters), we may care for this exception in "states.py" or "statemachine.py". If also Chinese and/or Korean languages imply "ambiguous width -> wide", specifying the "east_asiian_width" setting in parsers/rst/languages/.py (or languages/.py?) seems better.
Do we need to care about text parts in a different language or just check the document language (assuming that only one font is used to display the source)?
The text was updated successfully, but these errors were encountered: