This repository contains code for paper Detect All Abuse! Toward Universal Abusive Language Detection Models
Wang, K., Lu, D., Han, S. C., Long, S., & Poon, J. (2020)
Detect All Abuse! Toward Universal Abusive Language Detection Models
In Proceedings of the 28th International Conference on Computational Linguistics 2020 (pp. 6366-6376)
The code is mainly from https://github.com/kamalkraj/Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs
Download 3 files from https://github.com/kamalkraj/Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs/tree/master/data
train.txt
test.txt
val.txt
Download glove embeddings and unzip it: http://nlp.stanford.edu/data/glove.6B.zip
glove.6B.100d.txt
Please Note:
- Use Direct Abuse Embedding to generate D embedding
- Change the MAX_LEN to the max length of your target dataset
- "sent_text" variable should be a list of original sentences
Download the file from https://drive.google.com/file/d/152264axxTfmuYfb_7oWYQJFCggt06CEC/view?usp=sharing
Download the file from https://drive.google.com/file/d/1059cRocqijTNzrl0UOXkFnngqpEZ54c1/view?usp=sharing
Use Sarcasm Embedding Input to generate Implicit Input
After running Sarcasm Embedding, copy and paste the embedding into a text file "sarcasm_embedding.txt"
- Use Linguistic Behavior to get User Linguistic Behavior Embedding
- Change the sentences and raw labels
- Use Final Model to do the final model training and prediction
- Use the target dataset to fill in the "sentence_list" and "label_list"
- Change the "seq_length" based on your choice
- Change the "use_gcn" based on whether you want to use User Linguistic Behavior Embedding