Releases · cometkim/unicode-segmenter

Minor Changes

75492dc: Expose an internal state: _hd;

The first codepoint of a segment, which is often need to be checked its bounds.

For example,

for (const { segment } of graphemeSegments(text)) {
  const cp = segment.codePointAt(0)!;
  // Also need to `!` assertions in TypeScript.
  if (isBMP(cp)) {
    // ...
  }
}

It can be replaced by _hd state. no additional overhead.

Patch Changes

cd63858: Export bundled entries (/bundle/*.js)

Minor Changes

21cd789: Removed deprecated APIs
- searchGrapheme in unicode-segmenter/grapheme
- takeChar and takeCodePoint in unicode-segmenter/utils
Which are used internally before, but never from outside.
483d258: Reduced bundle size, while keeping the best perf

Some details:
- Refactored to use the same code path internally as possible.
- Removed pre-computed jump table, the optimization were compensated for by other perf improvements.
- Previous array layout to avoid accidental de-opt turned out to be overkill. The regular tuple array is well optimized, so I fall back to using good old plain binary search.
- Some experiments like new encoding and eytzinger layout for more aggressive improvements, but no success.

Patch Changes

a5f486f: Fix bloat in the NPM package.

package.tgz was mostly bloated by CommonJS interop and sourcemap.

However, sourcemap isn't necessary here as it uses sources as is,
and the CommonJS shouldn't be different.

Now fixed by simpler transpilation for CommoJS entries, and removed sourcemap files.
Also removed inaccessible entries.

So the unpacked total package size has been down to 135 KB from 250 KB

Note: Node.js v22 will stabilize require(ESM), which will allow CommonJS projects to use this package without having to maintain separate entries. I'm very excited about that, and looking forward to it becoming more "common". The first major release may consider ending support for CommonJS entries and TypeScript's "Node" resolution.

Patch Changes

94ed937: Improved perf and bundle size a bit

It seems using TypedArray isn't helpful,
and deref many prototypes may cause deopt.

Array is good enough while it ensures it's packed.
de71269: Update Intl type definition

Patch Changes

9d688d8: grapheme: rename countGrapheme() to countGraphemes(). existing name is deprecated alias.
be49399: grapheme: Add splitGraphemes() utility
5e86659: grapheme: add more detail to API JSDoc

Minor Changes

ffb41fb: Code size is signaficantly reduced, minified JS now works in half

There are also some performance improvements.
Not that much, but getting improvement on size without giving it up is a huge win.
- Compress Unicode data more in Base36
- Changed the internal representation into TypedArray to improve its access pattern.
- Shrank the grapheme lookup table size.
  This does not impact performance except for some edges like Hindi and Demonic, but it does reduce the bundle size.
9e0feca: Update to Unicode® 16.0.0

Patch Changes

3665cf7: Fix Hindi text segmentation

Minor Changes

73f5e6b: Significantly reduced bundle size by compressing data table. So the grapheme segmentation library is only takes 6.6kB (gzip) or 4.4kB (brotli)!

Patch Changes

b045320: Fix isSMP, and add more plane utils (isSIP, isTIP, isSSP)

Patch Changes

447b484: Fix polyfill to do not override existing, and also to be assigned as non-enumerable

Patch Changes

04fe2fc: Fix sourcemap reference error
- Include missing sourcemap files for transformed cjs entries
- Remove unnecessary transforms for esm entries and remove source map reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minor Changes

Patch Changes

Uh oh!

Minor Changes

Uh oh!

Patch Changes

Uh oh!

Patch Changes

Uh oh!

Patch Changes

Uh oh!

Minor Changes

Uh oh!

Patch Changes

Uh oh!

Minor Changes

Patch Changes

Uh oh!

Patch Changes

Uh oh!

Patch Changes

Uh oh!

Releases: cometkim/unicode-segmenter

unicode-segmenter@0.13.0

Minor Changes

Patch Changes

Uh oh!

unicode-segmenter@0.12.0

Minor Changes

Uh oh!

unicode-segmenter@0.11.3

Patch Changes

Uh oh!

unicode-segmenter@0.11.2

Patch Changes

Uh oh!

unicode-segmenter@0.11.1

Patch Changes

Uh oh!

unicode-segmenter@0.11.0

Minor Changes

Uh oh!

unicode-segmenter@0.10.1

Patch Changes

Uh oh!

unicode-segmenter@0.10.0

Minor Changes

Patch Changes

Uh oh!

unicode-segmenter@0.9.2

Patch Changes

Uh oh!

unicode-segmenter@0.9.1

Patch Changes

Uh oh!