Multi-Task Learning Sentence Transformer

This repository implements a simple Multi-Task Learning (MTL) model using a Sentence Transformer backbone (BERT). It demonstrates how to:

Encode sentences into fixed-length embeddings
Classify sentences for two different tasks (Task A and Task B)
Train a model using a shared transformer and two task-specific heads

Tasks

Task A: Sentence classification (e.g., topic or intent)
Task B: Sentiment analysis (e.g., positive, neutral, negative)

task1.py

Backbone Model: distilbert-base-uncased is chosen for its efficiency and performance tradeoff.
Pooling Strategy: Mean pooling of token embeddings is used to obtain fixed-length sentence embeddings. It averages only over non-padded tokens using the attention mask.
Embedding Normalization: Final embeddings are L2 normalized to make them suitable for similarity tasks.

task2.py

Changes made: Multi-task architecture with shared transformer encoder and two task-specific heads
Task-Specific Heads: Added 2 nn.Linear layers for Task A and Task B
- Task A Head: A feedforward classification layer for sentence class (e.g., topic)
- Task B Head: Another classification head for sentiment (positive, negative)
Shared Encoder: Using the same pretrained transformer distilbert-base-uncased to produce sentence embeddings for both tasks
Reused mean-pooling + normalization for embeddings
Multi-task Loss Handling: You can weight the losses for each task and combine them during training (not shown here since we're only outlining structure)

task3.txt

All transformer layers and task heads are frozen (no training)

Implications: Fast inference only, No learning or adaptation, Useful for static embeddings only
Advantages: Evaluation-only or production inference

Pretrained transformer is frozen; only task-specific heads are trained

Implications: Preserves general semantic knowledge, Efficient training, Prevents catastrophic forgetting
Advantages: Works well for tasks where general language understanding suffices but labels are task-specific

Train shared encoder and one head; freeze the other

Implications: Maintains performance on the frozen task, Helps avoid performance drop during transfer
Advantages: When reusing the model for a new task without harming performance of an old task

Transfer Learning Scenario

Assume applying this model to a new domain-specific multi-task problem (e.g., scientific text classification and sentiment analysis in research abstracts)

Choice of pre-trained model: distilbert-base-uncased was used here for speed
- For domain adaptation, allenai/scibert_scivocab_uncased is recommended for scientific text
Freezing strategy:
- Freeze lower transformer layers to retain general linguistic knowledge
- Unfreeze upper layers and heads: Adapts to task-specific signals without losing core representations.

Rationale

Lower layers capture general grammar and syntax
Higher layers adapt to task/domain semantics
Training heads ensures output space aligns with new labels

task4.py

Data: simulated batch of tokenized text + labels for Task A and Task B
Forward Pass: inputs pass through shared encoder, then to shared pooling and finally to two task heads
7149 Loss Function: separate CrossEntropyLoss per task; combined loss = weighted sum
Metrics: Per-task accuracy (classification accuracy on logits)
Optimization: Backprop from joint loss; both heads and (optionally) encoder updated
Assumption: All tasks are defined over the same input text

Please refer to task4.txt for the summary of training loop implementation

Setup

To reproduce the outputs, follow these steps:

Clone the repository

git clone https://github.com/PreethaSaha/MTL_senTransformer.git

Navigate to the project directory

cd MTL_senTransformer

Create and activate an environment

python3 -m venv venv        # Create a virtual environment named 'venv'
source venv/bin/activate    # Activate the virtual environment

Please ensure all the notebook files are in the same directory.

Install the required dependencies

 pip install -r requirements.txt

Output

task1.py

The script prints

sentence embeddings of 768 dimensions
cosine similarity between the two sentences showcasing semantic encoding

task2.py

We ran inference on 5 sample sentences. The script prints for each sentence:

predicted label values for Task A and Task B
first 5 dimensions of the embeddings

Since no training was performed, predictions are random and not yet meaningful.

task4.py

For each sentence, the script prints:

Input sentence
Predicted and true labels and description for both tasks

And for each epoch:

Loss
Accuracy for each task

Low accuracy is expected with 5 sentences and 1 epoch. This setup is useful to test code structure, not model performance. To get meaningful accuracy, one has to use a real dataset, train for multiple epochs with proper data splits, and evaluate on validation/test data for realistic metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Task Learning Sentence Transformer

Tasks

task1.py

task2.py

task3.txt

All transformer layers and task heads are frozen (no training)

Pretrained transformer is frozen; only task-specific heads are trained

Train shared encoder and one head; freeze the other

Transfer Learning Scenario

Rationale

task4.py

Setup

Output

task1.py

task2.py

task4.py

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
task1.py		task1.py
task2.py		task2.py
task3.txt		task3.txt
task4.py		task4.py
task4.txt		task4.txt

License

PreethaSaha/MTL_senTransformer

Folders and files

Latest commit

History

Repository files navigation

Multi-Task Learning Sentence Transformer

Tasks

task1.py

task2.py

task3.txt

All transformer layers and task heads are frozen (no training)

Pretrained transformer is frozen; only task-specific heads are trained

Train shared encoder and one head; freeze the other

Transfer Learning Scenario

Rationale

task4.py

Setup

Output

task1.py

task2.py

task4.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages