8000 GitHub - uveges/ABSA-PyTorch: Aspect Based Sentiment Analysis, PyTorch Implementations.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Aspect Based Sentiment Analysis, PyTorch Implementations.

Notifications You must be signed in to change notification settings

uveges/ABSA-PyTorch

 
 

Repository files navigation

Project description

Project for traning Aspect-Based Sentiment / Emotion analyzis BERT-based models. Implementation based on ABSA-PyTorch repositroy with custom extensions, like preprocessing data from .xlsx and .csv format, sentence segmentation based on the sentence-splitter python package, etc.

Use-cases

Prediction with finetuned model(s)

absa_babel_finetune/preprocessors/excel_to_sentences.py

  • Process input .xlsx files. Only the columns containing ID and text of the input files are parsed.
  • The output file is a table in .csv format, which also contains the ID and the sentences generated by segmenting the text in the following format:
    • A row is a sentence, where ID is the original ID of the text + the character "_" + the sentence line number format. The original ID of the text can therefore be recovered by stripping the last digit.

examples_predict.py

Assign predictions to texts previously segmented into sentences (with e.g. preprocessors/sentence_splitter.py). The script uses the DataPreparator class available in preprocessors/prepeare_data_for_prediction.py and the Predictor class from src/prediction.py.

The former is responsible for recognizing the Named Entities in the sentences specified in the text_column variable of the config.py configuration file for which prediction can be made. As an internal representation, it stores the received data in a python dictonary, which is currently not serialized at runtime.

The latter's task is to assign prediction to the prepared (Named Entity + sentence) pairs using the model initialized (BERT) with the model_parameters options in config.py, using the previous PyTorch checkpoint specified also there. The output is an .xlsx file with the IDs, text fields, and predictions given by the model (the latter is stored in a column named in the predictions_column variable of config.py). The name of the output file is the original filename, extended with a '_predictions' suffix.

Model training

examples_train.py

The script uses the Trainer class of src/training.py, which initializes the given BERT model with the parameters stored in config.py and then fine-tunes the original model in a standard way (using early stop). The output is a PyTorch checkpoint that can be loaded later at prediction time.

The model training is preceded by the sentence segmentation as already described if the data is not already in this required format). This is followed by Named Entity recognition (with the use of the specified spaCy language models --> config.py / spacy_model_name variable), and the construction of sentence + Named Entity pairs, also as already described.

If necessary, the train and test datasets can be created manually using the stratified_split function at preprocessors/stratified_split.py, which retains the label distributions specific to the original data set in both the train and test sets.

Licence

MIT

About

Aspect Based Sentiment Analysis, PyTorch Implementations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%
0