8000 bug(expr): `SUBSTR` of unicode produces error: "byte index _ is not a char boundary" · Issue #9065 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
bug(expr): SUBSTR of unicode produces error: "byte index _ is not a char boundary" #9065
Closed
@jon-chuang

Description

@jon-chuang

We should not calculate by bytes, but by unicode character:

To reproduce:
Risingwave:

=> select substr('Mér', 1, 2);
SSL SYSCALL error: EOF detected

PSQL:

=> select substr('Mér', 1, 2);
 substr 
--------
 Mé
(1 row)

Similar for ''Mér'::char(3)


resources:

src/expr/src/vector_op/substr.rs:48:23

pub fn substr_start_for(s: &str, start: i32, count: i32, writer: &mut dyn Write) -> Result<()> {

https://stackoverflow.com/questions/4249745/does-postgresql-varchar-count-using-unicode-character-length-or-ascii-character

Metadata

Metadata

Assignees

Labels

type/bugType: Bug. Only for issues.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0