8000 GitHub - davidmrau/dl4nlp: Language Identification Models
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

davidmrau/dl4nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Natural Language Technologies

In this paper, a Multilayer Perceptron (MLP) with tf-idf features and n-gram features, a Recurrent Neural Network (RNN) are applied to identify the predominant language of a given paragraph from the WiLI-2018 dataset [Thoma, 2018]. The WiLI-2018 dataset includes 235 distinct languages. We could achieve 90 % accuracy on the test set for the MLP with tf-idf features, 93% for the MLP with n-grams and 91% with the RNN.

Implementation of the three lanugage identification models:

  1. MLP classifier with tf-idf features

  2. MLP classifier with n-grams features

  3. RNN

Each folder contains a readme that explains the parameter of each python script. Example bash scripts show how each python script in the pipeline can be runned.

Releases

No releases published

Packages

No packages published
0