ABSA-PyTorch

This project implements aspect-based sentiment analysis using BERT models in a PyTorch environment. The config.py file contains key configurations designed for training and predicting with models in English and Hungarian.

Prerequisites

Before running the project, install the required dependencies listed in requirements_rtx30.txt:

pip install -r requirements_rtx30.txt

Input Requirements

The code expects a raw, unprocessed .xlsx file as input for both training and prediction tasks. File processing and necessary transformations for the model are handled internally by the code.

Required Columns in the .xlsx File

Text Column: The .xlsx file must include a column as specified by text_column in config.py, which contains the text data.

Output Columns Generated During Prediction

During prediction, the identified aspects will be stored in the following columns:

Named Entity: The column specified by NE_column will contain the named entities extracted from the text.
Named Entity Type: The column specified by NE_type_column will contain the type of each named entity.
Sentiment Prediction: The column specified by predictions_column will contain the sentiment prediction results.

Output Format

The output is organized in a sentence + named entity pair format. Each row will contain:

The sentence text
The named entity and its type
The sentiment prediction value

This format enables easy analysis by linking each entity with its corresponding sentiment and context within the sentence.

Currently Used Python Files

Prediction

Main script: ./examples/examples_predict.py
Dependencies:
- Data Preparation: NER-based data preprocessing is handled by ./preprocessors/prepeare_data_for_prediction.py (DataPreparator class), which transforms raw xlsx data into prediction-ready format.
- Prediction: Predictions are made using ./src/prediction.py (Predictor class).
- Configurations: All settings are specified in ./config.py.

Training

Main script: ./examples/examples_train.py
Dependencies:
- Training: The training process is controlled by ./src/training.py (Trainer class).
- Configurations: All settings are specified in ./config.py.

Config.py Parameters

Train - Test Set Creation Parameters

dataset_name: Name of the dataset, in this case: Validated.
test_size: Proportion of the dataset to use as the test set, e.g., 0.2 (20%).
text_column: Column name for text data.
NE_column: Column name for Named Entity (NER) labeling.
NE_type_column: Column for the type of Named Entity.
predictions_column: Column for storing prediction results.

Model-Specific Parameters

checkpoint: Path to the BERT model checkpoint containing the latest training state.
train_dataset and test_dataset: Paths to the English and Hungarian training and test datasets.
bert_model: The BERT model to use. For English: bert-base-cased, and for Hungarian: SZTAKI-HLT/hubert-base-cc.
spacy_model_name: SpaCy model name for NER, e.g., en_core_web_lg for English or hu_co 5621 re_news_lg for Hungarian.

Model Parameters

dropout: Dropout rate (0.01).
bert_dim: Hidden layer dimension of the BERT model (768).
polarities_dim: Number of sentiment polarities (3).
max_seq_len: Maximum input sequence length for BERT (85).
bert_model_name: The name of the BERT model used.
optimizer: Optimization algorithm, e.g., adam.
initializer: Weight initialization method, e.g., xavier_uniform_.
lr: Learning rate, set to 2e-5.
l2reg: L2 regularization factor (0.01).
num_epoch: Number of epochs during training (20).
batch_size: Batch size (16).
log_step: Step interval for logging (10).
embed_dim and hidden_dim: Dimensions of the embedding and hidden layers (300).
hops: Steps for the attention mechanism (3).
patience: Number of epochs to wait for improvement before stopping (5).
device: Device for computation (CPU or GPU).
seed: Seed for randomness (1234).
valset_ratio: Size of the validation set (0, so no separate validation set is used within the test set).

Licence

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABSA-PyTorch

Prerequisites

Input Requirements

Required Columns in the .xlsx File

Output Columns Generated During Prediction

Output Format

Currently Used Python Files

Prediction

Training

Config.py Parameters

Train - Test Set Creation Parameters

Model-Specific Parameters

Model Parameters

Licence

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
checkers		checkers
datasets		datasets
layers		layers
models		models
preprocessors		preprocessors
resources		resources
src		src
state_dict		state_dict
README.md		README.md
config.py		config.py
data_utils.py		data_utils.py
examples_predict.py		examples_predict.py
examples_train.py		examples_train.py
requirements_rtx30.txt		requirements_rtx30.txt

uveges/ABSA_pytorch

Folders and files

Latest commit

History

Repository files navigation

ABSA-PyTorch

Prerequisites

Input Requirements

Required Columns in the .xlsx File

Output Columns Generated During Prediction

Output Format

Currently Used Python Files

Prediction

Training

Config.py Parameters

Train - Test Set Creation Parameters

Model-Specific Parameters

Model Parameters

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages