Project description

Project for traning Aspect-Based Sentiment / Emotion analyzis BERT-based models. Implementation based on ABSA-PyTorch repositroy with custom extensions, like preprocessing data from .xlsx and .csv format, sentence segmentation based on the sentence-splitter python package, etc.

Use-cases

Prediction with finetuned model(s)

absa_babel_finetune/preprocessors/excel_to_sentences.py

Process input .xlsx files. Only the columns containing ID and text of the input files are parsed.
The output file is a table in .csv format, which also contains the ID and the sentences generated by segmenting the text in the following format:
- A row is a sentence, where ID is the original ID of the text + the character "_" + the sentence line number format. The original ID of the text can therefore be recovered by stripping the last digit.

examples_predict.py

Assign predictions to texts previously segmented into sentences (with e.g. preprocessors/sentence_splitter.py). The script uses the DataPreparator class available in preprocessors/prepeare_data_for_prediction.py and the Predictor class from src/prediction.py.

The former is responsible for recognizing the Named Entities in the sentences specified in the text_column variable of the config.py configuration file for which prediction can be made. As an internal representation, it stores the received data in a python dictonary, which is currently not serialized at runtime.

The latter's task is to assign prediction to the prepared (Named Entity + sentence) pairs using the model initialized (BERT) with the model_parameters options in config.py, using the previous PyTorch checkpoint specified also there. The output is an .xlsx file with the IDs, text fields, and predictions given by the model (the latter is stored in a column named in the predictions_column variable of config.py). The name of the output file is the original filename, extended with a '_predictions' suffix.

Model training

examples_train.py

The script uses the Trainer class of src/training.py, which initializes the given BERT model with the parameters stored in config.py and then fine-tunes the original model in a standard way (using early stop). The output is a PyTorch checkpoint that can be loaded later at prediction time.

The model training is preceded by the sentence segmentation as already described if the data is not already in this required format). This is followed by Named Entity recognition (with the use of the specified spaCy language models --> config.py / spacy_model_name variable), and the construction of sentence + Named Entity pairs, also as already described.

If necessary, the train and test datasets can be created manually using the stratified_split function at preprocessors/stratified_split.py, which retains the label distributions specific to the original data set in both the train and test sets.

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
checkers		checkers
datasets		datasets
examples		examples
layers		layers
models		models
once_used_scripts		once_used_scripts
preprocessors		preprocessors
resources		resources
results		results
src		src
state_dict		state_dict
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data_utils.py		data_utils.py
examples_predict.py		examples_predict.py
examples_train.py		examples_train.py
hf_auth.sh		hf_auth.sh
requirements.txt		requirements.txt
requirements_rtx30.txt		requirements_rtx30.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project description

Use-cases

Prediction with finetuned model(s)

Model training

Licence

About

Releases

Packages

Languages

uveges/ABSA-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Project description

Use-cases

Prediction with finetuned model(s)

Model training

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages