8000 GitHub - WorksApplications/Sudachi at v0.3.2
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

WorksApplications/Sudachi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sudachi

Sudachi logo

Build Status Quality Gate

日本語 README

Sudachi is Japanese morphological analyzer. Morphological analysis consists mainly of the following tasks.

  • Segmentation
  • Part-of-speech tagging
  • Normalization

Features

Sudachi has the following features.

  • Multiple-length segmentation
    • You can change the mode of segmentations
    • Extract morphemes and named entities at once
  • Large lexicon
    • Based on UniDic and NEologd
  • Plugins
    • You can change the behavior of processings
  • Work closely with the synonym dictionary
    • We will release the sysnonym dictionary at a later date

Dictionaries

Sudachi has three types of dictionaries.

For more details, see SudachiDict.

How to use the small / full dictionary

Run the command line tool with the configuration string

$ java -jar sudachi-XX.jar -s '{"systemDict":"system_small.dic"}'

Use on the command line

$ java -jar sudachi-XX.jar [-r conf] [-s json] [-m mode] [-a] [-d] [-f] [-o output] [file...]

Options

  • -r conf specifies the setting file (overrids -s)
  • -s json additional settings (overrids -r)
  • -p directory root directory of resources
  • -m {A|B|C} specifies the mode of splitting
  • -a outputs the dictionary form and the reading form
  • -d dump the debug outputs
  • -o specifies output file (default: the standard output)
  • -f ignore errors

Examples

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar
æ±äº¬éƒ½  å詞,固有å詞,地å,一般,*,*     æ±äº¬éƒ½
㸠     助詞,格助詞,*,*,*,*     ã¸
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       行ã
EOS

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar -a
æ±äº¬éƒ½  å詞,固有å詞,地å,一般,*,*     æ±äº¬éƒ½  æ±äº¬éƒ½  トウキョウト
㸠     助詞,格助詞,*,*,*,*     㸠     㸠     エ
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       è¡Œã    è¡Œã    イク
EOS

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar -m A
æ±äº¬    å詞,固有å詞,地å,一般,*,*     æ±äº¬
都      å詞,普通å詞,一般,*,*,*        都
㸠     助詞,格助詞,*,*,*,*     ã¸
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       行ã
EOS

How to use the API

You can find details in the Javadoc.

To compile an application with Sudachi API, declare a dependency on Sudachi in maven project.

<dependency>
  <groupId>com.worksap.nlp</groupId>
  <artifactId>sudachi</artifactId>
  <version>0.3.2</version>
</dependency>

The modes of splitting

Sudachi provides three modes of splitting. In A mode, texts are divided into the shortest units equivalent to the UniDic short unit. In C mode, it extracts named entities. In B mode, into the middle units.

The followings are examples in the core dictionary.

Aï¼šé¸æŒ™/管ç†/委員/会
Bï¼šé¸æŒ™/管ç†/委員会
Cï¼šé¸æŒ™ç®¡ç†å§”員会

A:客室/乗務/員
B:客室/乗務員
C:客室乗務員

A:労åƒ/者/å”åŒ/組åˆ
B:労åƒè€…/å”åŒ/組åˆ
C:労åƒè€…å”åŒçµ„åˆ

A:機能/性/食å“
B:機能性/食å“
C:機能性食å“

The followings are examples in the full dictionary.

A:医薬/å“/安全/管ç†/責任/者
B:医薬å“/安全/管ç†/責任者
C:医薬å“安全管ç†è²¬ä»»è€…

A:消費/者/安全/調査/委員/会
B:消費者/安全/調査/委員会
C:消費者安全調査委員会

A:ã•ã£ã½ã‚/テレビ/å¡”
B:ã•ã£ã½ã‚/テレビ塔
C:ã•ã£ã½ã‚テレビ塔

A:カンヌ/国際/映画/祭
B:カンヌ/国際/映画祭
C:カンヌ国際映画祭

In full-text searching, to use A and B can imrove precision and recall.

Plugins

You can use or make plugins which modify the behavior of Sudachi.

Type of Plugins Example
Modify the Inputs Character nomalization
Make OOVs Considering script styles
Connect Words Inhibition, Overwrite costs
Modify the Path Fix Person names, Equalization of splitting

Prepared Plugins

We prepared following plugins.

Type of Plugins Plugin
Modify the Inputs character nomalization Full/half-width, Cases, Variants
normalization of prolong symbols Normalize "~", "ー"s
Make OOVs Make one character OOVs Use as the fallback
MeCab compatible OOVs
Connect Words Inhibition Specified by part-of-speech
Modify the Path Join Katakata OOVs
Join numerics
Equalization of splitting* Smooth of OOVs and not OOVs
Normalize numerics Normalize Kanji numerics and scales
Estimate person names*

* will be released at a later date.

Normalized Form

Sudachi normalize the following variations.

  • Okurigana
    • e.g. 打込む → 打ã¡è¾¼ã‚€
  • Script
    • e.g. ã‹ã¤ä¸¼ → カツ丼
  • Variant
    • e.g. 附属 → 付属
  • Misspelling
    • e.g. シュミレーション → シミュレーション
  • Contracted form
    • e.g. ã¡ã‚ƒã‚ → ã¦ã¯

Character Normalization

DefaultInputTextPlugin normalizes an input text in the following order.

  1. To lower case by Character.toLowerCase()
  2. Unicode normalization by NFKC

When rewrite.def has the following descriptions, DefaultInputTextPlugin stops the above processing and aplies the followings.

  • Ignore
# single code point: this character is skipped in character normalization
é«™
  • Replace
# rewrite rule: <target> <replacement>
A' Ā

If the number of characters increases as a result of character normalization, Sudachi may output morphemes whose length is 0 in the original input text.

User Dictionary

To create and use your own dictionaries, please refer to docs/user_dict.md.

Comparison with MeCab and Kuromoji

Sudachi MeCab kuromoji
Multiple Segmentation Yes     No   Limited ^a
Normalization Yes No Limited ^b
Joining, Correction Yes No Limited ^b
Use multiple user dictionary Yes Yes No
Saving Memory Good ^c Poor Good
Accuracy Good Good Good
Speed Good Excellent Good
  • ^a: approximation with n-best
  • ^b: with Lucene filters
  • ^c: memory sharing with multiple Java VMs

Future Releases

  • Speeding up
  • Releasing plugins
  • Improving the accuracy
  • Adding more split informations
  • Adding more normalized forms
  • Fix reading forms (pronunciation -> Furigana)
  • Coodinating segmentations with the synonym dictionary

Licenses

Sudachi

Sudachi by Works Applications Co., Ltd. is licensed under the Apache License, Version2.0

Copyright (c) 2017 Works Applications Co., Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Elasticsearch

We release a plug-in for Elasticsearch.

Python

An implementation of Sudachi in Python

Slack

We have a Slack workspace for developers and users to ask questions and discuss a variety of topics.

Citing Sudachi

We have published a paper about Sudachi and its language resources; "Sudachi: a Japanese Tokenizer for Business" (Takaoka et al., LREC2018).

When citing Sudachi in papers, books, or services, please use the follow BibTex entry;

@InProceedings{TAKAOKA18.8884,
  author = {Kazuma Takaoka and Sorami Hisamoto and Noriko Kawahara and Miho Sakamoto and Yoshitaka Uchida and Yuji Matsumoto},
  title = {Sudachi: a Japanese Tokenizer for Business},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-00-9},
  language = {english}
  }

Sudachi (日本語README)

English README

Sudachi ã¯æ—¥æœ¬èªžå½¢æ…‹ç´ è§£æžå™¨ã§ã™ã€‚形態素解æžã¯ãŠã‚‚ã«ä»¥ä¸‹ã®3ã¤ã®å‡¦ç†ã‚’ ãŠã“ãªã„ã¾ã™ã€‚

  • テキスト分割
  • å“詞付与
  • æ­£è¦åŒ–処ç†

Sudachi ã®ç‰¹é•·

Sudachi ã¯å¾“æ¥ã®å½¢æ…‹ç´ è§£æžå™¨ã¨ãらã¹ã€ä»¥ä¸‹ã®ã‚ˆã†ãªç‰¹é•·ãŒã‚りã¾ã™ã€‚

  • 複数ã®åˆ†å‰²å˜ä½ã®ä½µç”¨
    • å¿…è¦ã«å¿œã˜ã¦åˆ‡ã‚Šæ›¿ãˆ
    • 形態素解æžã¨å›ºæœ‰è¡¨ç¾æŠ½å‡ºã®èžåˆ
  • 多数ã®åŽéŒ²èªžå½™
    • UniDic 㨠NEologd をベースã«èª¿æ•´
  • 機能ã®ãƒ—ラグイン化
    • 文字正è¦åŒ–や未知語処ç†ã«æ©Ÿèƒ½è¿½åŠ ãŒå¯èƒ½
  • åŒç¾©èªžè¾žæ›¸ã¨ã®é€£æº
    • 後日公開予定

辞書ã®å–å¾—

Sudachi ã«ã¯3種類ã®è¾žæ›¸ãŒã‚りã¾ã™ã€‚

ãã‚ã—ã㯠SudachiDict ã‚’ã”らんãã ã•ã„。

スモール/フル辞書ã®åˆ©ç”¨æ–¹æ³•

コマンドラインツールã§è¨­å®šæ–‡å­—列を指定ã—ã¾ã™

$ java -jar sudachi-XX.jar -s '{"systemDict":"system_small.dic"}'

コマンドラインツール

$ java -jar sudachi-XX.jar [-r conf] [-s json] [-m mode] [-a] [-d] [-f] [-o output] [file...]

オプション

  • -r conf 設定ファイルを指定 (-s ã¨æŽ’ä»–)
  • -s json デフォルト設定ã®ä¸Šæ›¸ã (-r ã¨æŽ’ä»–)
  • -p directory リソースã®èµ·ç‚¹ã¨ãªã‚‹ãƒ‡ã‚£ãƒ¬ã‚¯ãƒˆãƒªã‚’指定
  • -m {A|B|C} 分割モード
  • -a 読ã¿ã€è¾žæ›¸å½¢ã‚‚出力
  • -d デãƒãƒƒã‚°æƒ…å ±ã®å‡ºåŠ›
  • -o 出力ファイル (指定ãŒãªã„å ´åˆã¯æ¨™æº–出力)
  • -f エラーを無視ã—ã¦å‡¦ç†ã‚’続行ã™ã‚‹

出力例

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar
æ±äº¬éƒ½  å詞,固有å詞,地å,一般,*,*     æ±äº¬éƒ½
㸠     助詞,格助詞,*,*,*,*     ã¸
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       行ã
EOS

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar -a
æ±äº¬éƒ½  å詞,固有å詞,地å,一般,*,*     æ±äº¬éƒ½  æ±äº¬éƒ½  トウキョウト
㸠     助詞,格助詞,*,*,*,*     㸠     㸠     エ
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       è¡Œã    è¡Œã    イク
EOS

$ echo æ±äº¬éƒ½ã¸è¡Œã | java -jar target/sudachi.jar -m A
æ±äº¬    å詞,固有å詞,地å,一般,*,*     æ±äº¬
都      å詞,普通å詞,一般,*,*,*        都
㸠     助詞,格助詞,*,*,*,*     ã¸
è¡Œã    動詞,éžè‡ªç«‹å¯èƒ½,*,*,五段-カ行,終止形-一般       行ã
EOS

ライブラリã®åˆ©ç”¨

ライブラリã¨ã—ã¦ã®åˆ©ç”¨ã¯ Javadoc ã‚’å‚ç…§ã—ã¦ãã ã•ã„。

Maven プロジェクトã§åˆ©ç”¨ã™ã‚‹å ´åˆã¯ä»¥ä¸‹ã® dependency を追加ã—ã¦ãã ã•ã„。

<dependency>
  <groupId>com.worksap.nlp</groupId>
  <artifactId>sudachi</artifactId>
  <version>0.3.2</version>
</dependency>

分割モード

Sudachi ã§ã¯çŸ­ã„æ–¹ã‹ã‚‰ A, B, C ã®3ã¤ã®åˆ†å‰²ãƒ¢ãƒ¼ãƒ‰ã‚’æä¾›ã—ã¾ã™ã€‚ A 㯠UniDic 短å˜ä½ç›¸å½“ã€C ã¯å›ºæœ‰è¡¨ç¾ç›¸å½“ã€B 㯠A, C ã®ä¸­é–“çš„ãªå˜ä½ã§ã™ã€‚

以下ã«ä¾‹ã‚’示ã—ã¾ã™ã€‚

(コア辞書利用時)

Aï¼šé¸æŒ™/管ç†/委員/会
Bï¼šé¸æŒ™/管ç†/委員会
Cï¼šé¸æŒ™ç®¡ç†å§”員会

A:客室/乗務/員
B:客室/乗務員
C:客室乗務員

A:労åƒ/者/å”åŒ/組åˆ
B:労åƒè€…/å”åŒ/組åˆ
C:労åƒè€…å”åŒçµ„åˆ

A:機能/性/食å“
B:機能性/食å“
C:機能性食å“

(フル辞書利用時)

A:医薬/å“/安全/管ç†/責任/者
B:医薬å“/安全/管ç†/責任者
C:医薬å“安全管ç†è²¬ä»»è€…

A:消費/者/安全/調査/委員/会
B:消費者/安全/調査/委員会
C:消費者安全調査委員会

A:ã•ã£ã½ã‚/テレビ/å¡”
B:ã•ã£ã½ã‚/テレビ塔
C:ã•ã£ã½ã‚テレビ塔

A:カンヌ/国際/映画/祭
B:カンヌ/国際/映画祭
C:カンヌ国際映画祭

検索用途ã§ã‚れ㰠A 㨠C を併用ã™ã‚‹ã“ã¨ã§ã€å†ç¾çއã¨é©åˆçŽ‡ã‚’å‘上ã•ã›ã‚‹ ã“ã¨ãŒã§ãã¾ã™ã€‚

機能追加プラグイン

Sudachi ã§ã¯å½¢æ…‹ç´ è§£æžã®å„ステップをフックã—ã¦å‡¦ç†ã‚’å·®ã—込むプラグイン機構を æä¾›ã—ã¦ã„ã¾ã™ã€‚

プラグイン 処ç†ä¾‹
入力テキスト修正 異体字統制ã€è¡¨è¨˜è£œæ­£
æœªçŸ¥èªžå‡¦ç† æ–‡å­—ç¨®ã«ã‚ˆã‚‹èª¿æ•´
å˜èªžæŽ¥ç¶šå‡¦ç† å“詞接続ç¦åˆ¶ã€ã‚³ã‚¹ãƒˆå€¤ä¸Šæ›¸ã
出力解修正 人å処ç†ã€åˆ†å‰²ç²’度調整

プラグインを作æˆã™ã‚‹ã“ã¨ã§ãƒ¦ãƒ¼ã‚¶ãƒ¼ãŒç‹¬è‡ªã®å‡¦ç†ã‚’ãŠã“ãªã†ã“ã¨ãŒã§ãã¾ã™ã€‚

システムæä¾›ãƒ—ラグイン

システムæä¾›ã®ãƒ—ラグインã¨ã—ã¦ä»¥ä¸‹ã®ã‚‚ã®ã‚’利用ã§ãã¾ã™ã€‚

処ç†éƒ¨åˆ† プラグイン
入力テキスト修正 文字列正è¦åŒ– å…¨åŠè§’ã€å¤§æ–‡å­—/å°æ–‡å­—ã€ç•°ä½“å­—
カスタマイズå¯èƒ½
長音正è¦åŒ– 「~ã€ã‚„長音記å·é€£ç¶šã®æ­£è¦åŒ–
æœªçŸ¥èªžå‡¦ç† 1文字未知語 フォールãƒãƒƒã‚¯ã¨ã—ã¦åˆ©ç”¨
MeCab互æ›
å˜èªžæŽ¥ç¶šå‡¦ç† å“詞接続ç¦åˆ¶ カスタマイズå¯èƒ½
出力解修正 カタカナ未知語ã¾ã¨ã‚上ã’
数詞ã¾ã¨ã‚上ã’
分割粒度調整* 未知語/既知語ã®åˆ†å‰²ç²’度ã®å¹³æ»‘化
数詞正è¦åŒ– 漢数詞やä½å–ã‚Šã®æ­£è¦åŒ–
人å補正* 敬称やå‰å¾Œé–¢ä¿‚ã‹ã‚‰äººå部を推定

* ã¯å¾Œæ—¥å…¬é–‹äºˆå®š

表記正è¦åŒ–

Sudachi ã®ã‚·ã‚¹ãƒ†ãƒ è¾žæ›¸ã§ã¯ä»¥ä¸‹ã®ã‚ˆã†ãªè¡¨è¨˜æ­£è¦åŒ–ã‚’æä¾›ã—ã¾ã™ã€‚

  • é€ã‚Šé•ã„
    • 例) 打込む → 打ã¡è¾¼ã‚€
  • 字種
    • 例) ã‹ã¤ä¸¼ → カツ丼
  • 異体字
    • 例) 附属 → 付属
  • 誤用
    • 例) シュミレーション → シミュレーション
  • 縮約
    • 例) ã¡ã‚ƒã‚ → ã¦ã¯

文字正è¦åŒ–

デフォルトã§é©ç”¨ã•れるプラグイン DefaultInputTextPlugin ã§å…¥åŠ›æ–‡ã«å¯¾ã—ã¦ä»¥ä¸‹ã®é †ã§æ­£è¦åŒ–ã‚’ãŠã“ãªã„ã¾ã™ã€‚

  1. Character.toLowerCase() ã‚’ã¤ã‹ã£ãŸå°æ–‡å­—化
  2. NFKC ã‚’ã¤ã‹ã£ãŸ Unicode æ­£è¦åŒ–

ãŸã ã—ã€rewrite.def ã«ä»¥ä¸‹ã®è¨˜è¿°ãŒã‚ã£ãŸå ´åˆã¯ä¸Šè¨˜ã®å‡¦ç†ã¯é©ç”¨ã•れãšã€ã“ã¡ã‚‰ã®å‡¦ç†ãŒå„ªå…ˆã•れã¾ã™ã€‚

  • æ­£è¦åŒ–抑制
# コードãƒã‚¤ãƒ³ãƒˆãŒ1ã¤ã®ã¿è¨˜è¿°ã•れã¦ã„ã‚‹å ´åˆã¯ã€æ–‡å­—æ­£è¦åŒ–を抑制ã—ã¾ã™
é«™
  • ç½®æ›
# ç½®æ›å¯¾è±¡æ–‡å­—列 ç½®æ›å…ˆæ–‡å­—列
A' Ā

文字正è¦åŒ–ã®çµæžœã€æ–‡å­—æ•°ãŒå¢—ãˆãŸå ´åˆã€åŽŸæ–‡ä¸Šã§ã¯é•·ã•ãŒ0ã«ãªã‚‹å½¢æ…‹ç´ ãŒå‡ºåŠ›ã•れるã“ã¨ãŒã‚りã¾ã™ã€‚

ユーザー辞書

ユーザー辞書ã®ä½œæˆã¨åˆ©ç”¨æ–¹æ³•ã«ã¤ã„ã¦ã¯ã€docs/user_dict.mdã‚’ã”覧ãã ã•ã„。

MeCab / kuromoji ã¨ã®æ¯”較

Sudachi MeCab kuromoji
分割å˜ä½ã®ä½µç”¨ â—‹ × â–³ ^1
文字正è¦åŒ–ã€è¡¨è¨˜æ­£è¦åŒ– â—‹ × â–³ ^2
ã¾ã¨ã‚上ã’ã€è£œæ­£å‡¦ç† â—‹ × â–³ ^2
複数ユーザ辞書ã®åˆ©ç”¨ â—‹ â—‹ ×
çœãƒ¡ãƒ¢ãƒª â—Ž ^3 â–³ â—‹
è§£æžç²¾åº¦ â—‹ â—‹ â—‹
è§£æžé€Ÿåº¦ â–³ â—‹ â–³
  • ^1: n-bestè§£ã«ã‚ˆã‚‹è¿‘ä¼¼
  • ^2: Lucene フィルター併用
  • ^3: メモリマップ利用ã«ã‚ˆã‚‹è¤‡æ•° JavaVM ã§ã®è¾žæ›¸å…±æœ‰

今後ã®ãƒªãƒªãƒ¼ã‚¹ã§ã®å¯¾å¿œäºˆå®š

  • 高速化
  • æœªå®Ÿè£…ãƒ—ãƒ©ã‚°ã‚¤ãƒ³ã®æ•´å‚™
  • è§£æžç²¾åº¦å‘上
  • åˆ†å‰²æƒ…å ±ã®æ‹¡å……
  • æ­£è¦åŒ–è¡¨è¨˜ã®æ‹¡å……
  • èª­ã¿æƒ…å ±ã®æ•´å‚™ (発音読㿠→ ãµã‚ŠãŒãªèª­ã¿)
  • åŒç¾©èªžè¾žæ›¸ã¨ã®é€£æº

Elasticsearch

Elasticsearch ã§ Sudachi ã‚’ã¤ã‹ã†ãŸã‚ã®ãƒ—ラグインも公開ã—ã¦ã„ã¾ã™ã€‚

Python

Python 版も公開ã—ã¦ã„ã¾ã™ã€‚

Slack

é–‹ç™ºè€…ã‚„ãƒ¦ãƒ¼ã‚¶ãƒ¼ã®æ–¹ã€…ãŒè³ªå•ã—ãŸã‚Šè­°è«–ã™ã‚‹ãŸã‚ã®Slackワークスペースを用æ„ã—ã¦ã„ã¾ã™ã€‚

Sudachiã®å¼•用

Sudachiã¨ãã®è¨€èªžè³‡æºã«ã¤ã„ã¦ã€è«–文を発表ã—ã¦ã„ã¾ã™; "Sudachi: a Japanese Tokenizer for Business" (Takaoka et al., LREC2018).

Sudachiを論文や書ç±ã€ã‚µãƒ¼ãƒ“スãªã©ã§å¼•用ã•れる際ã«ã¯ã€ä»¥ä¸‹ã®BibTexã‚’ã”利用ãã ã•ã„。

@InProceedings{TAKAOKA18.8884,
  author = {Kazuma Takaoka and Sorami Hisamoto and Noriko Kawahara and Miho Sakamoto and Yoshitaka Uchida and Yuji Matsumoto},
  title = {Sudachi: a Japanese Tokenizer for Business},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-00-9},
  language = {english}
  }

About

A Japanese Tokenizer for Business

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Languages

0