-
Notifications
You must be signed in to change notification settings - Fork 47
perf: Optimize format_word function performance by using direct unicode mapping #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you 8000 agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// checked char is in range of fullwidth number and alphabetic | ||
unsafe { char::from_u32_unchecked(c as u32 - 0xFEE0) } | ||
} | ||
'\u{3000}' => ' ', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你怎么知道我之前那个看不见的空格是 \u{3000}
的? 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Halfwidth_and_Fullwidth_Forms_(Unicode_block)
U+FF00 does not correspond to a fullwidth ASCII 20 (space character), since that role is already fulfilled by U+3000 "ideographic space".
let out = text | ||
.chars() | ||
.map(|c| match c { | ||
'\u{FF10}'..='\u{FF19}' | '\u{FF21}'..='\u{FF3A}' | '\u{FF41}'..='\u{FF5A}' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 range 有没有参考链接,我看一下范围的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看起来这里可以改成完整的 Fullwidth 表格,之前还有一些我漏掉的,比如:@
-> @
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我跟进,在你这个基础上补充一下
This PR optimizes the
format_word
fn to improve performance when converting full-width characters to half-width.Changes
This change should result in lower CPU usage and memory allocation when processing text with many full-width characters. for example, run the
format_json_2k
benchmark on my local pc: