Determine width of ambiguous characters by language [SF:patches:155] #637

chrisjsewell · 2020-08-09T14:54:57Z

author: tk0miya
created: 2019-04-13 03:44:18.434000
assigned: None
SF_url: https://sourceforge.net/p/docutils/patches/155

attachments:

https://sourceforge.net/p/docutils/patches/155/attachment/east_asian_width.patch

At unicode specification, some characters are categories as "ambiguous". Their width are changed in locale. In some locale (mainly east asian), they are treated as having wide-width. (In detail, please refer "EAST ASIAN WIDTH" report http://www.unicode.org/reports/tr11/#Definitions)

This patch changes the width of these "ambiguous" characters in docutils via language setting. And it contains the setting for Japanese.

This allows to parse tables containing ambiguous characters correctly (ex. cyrillic characters and symbols).

commenter: milde
posted: 2019-06-24 15:36:40.843000
title: #155 Determine width of ambiguous characters by language

Thank you for the patch.
I have some suggestions:

Do not specify "east_asian_width" for languages where it is the default ('WF'). Fall back to this default for languages that do not override it (or "East Asian" locales that use 'WF').
If "ja" (Japanese) is the only language where "ambiguous width characters" should be treated as wide (i.e. where it is likely that the program used to edit rST sources uses a fixed-width font with double-width glyphs for these characters), we may care for this exception in "states.py" or "statemachine.py". If also Chinese and/or Korean languages imply "ambiguous width -> wide", specifying the "east_asiian_width" setting in parsers/rst/languages/.py (or languages/.py?) seems better.

Do we need to care about text parts in a different language or just check the document language (assuming that only one font is used to display the source)?

chrisjsewell added open patches priority-5 labels Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine width of ambiguous characters by language [SF:patches:155] #637

Determine width of ambiguous characters by language [SF:patches:155] #637

Determine width of ambiguous characters by language [SF:patches:155] #637

Determine width of ambiguous characters by language [SF:patches:155] #637

Comments