GitHub - srijith1996/FSL-SAGE: Federated Split Learning via Smashed Activation Gradient Estimation

FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Introduction

Our Federated Split Learning (FSL) algorithm cuts down on communication overheads in traditional Split Learning methods by directly estimating server-returned gradients at each client using auxiliary models. The auxiliary models are much smaller versions of the server model which are explicitly trained to estimate the gradients that the server model would return for the client's local input.

The algorithm is summarized in the following schematic:

Please refer to our paper for further details, and consider citing the same if you find this work useful.

Requirements

The project requirements can be simply installed using the environment config file conda_env.yaml as follows:

conda env create -f conda_env.yaml

which will create a conda environment by the name sage. You can activate the conda environment using:

conda activate sage

and all dependency requirements should be met.

Configuration

This project is powered by Hydra, which allows hierarchical configurations and easy running of multiple ML experiments. The config files for hydra are located in the folder hydra_config.

There is a high degree of customizability here; datasets, models and FL algorithms can be plugged in using configs. Please check out our contributing readme for more details.

Datasets

Currently datasets are read and imported from the datas folder in the root of the repository. You can simply create a folder for the repository and download the dataset there. After performing the necessary preprocessing, simply use/extend the get_dataset() function in datasets/__init__.py

Training

To train FSL-SAGE with defaults from config.yaml, you can simply run

python main.py

Training results are saved in a saves folder in the root, where they will be saved in folders segregated based on the FL algorithm, model, dataset and distribution used.

The default number of clients (num_clients) is set to 10 and the default number of rounds is rounds=200. Each method will train upto the fixed rounds or until the number of MBs specified in comm_threshold_mb is reached.

To choose a specific model or algorithm, the Hydra command-line override functionality can be used as follows

python main.py model=resnet18 algorithm=cse_fsl dataset=cifar100 dataset.distribution=iid

The following options are currently supported, click them to reveal the details:

Algorithm

Syntax : algorithm=<key>. The FL algorithm to use for training. List of algorithms currently supported:

Key	Algorithm
`fed_avg`	FedAvg
`sl_multi_server`	SplitFedv1
`sl_single_server`	SplitFedv2
`cse_fsl`	CSE-FSL
`fsl_sage`	FSL-SAGE

Dataset

Syntax : dataset=<key>. The dataset used in training. List of datasets currently supported:

Key	Dataset
`cifar10`	cifar10
`cifar100`	cifar100

Model

Syntax : model=<key>. The ML model to use for training. List of models currently supported:

Key	Model
`resnet18`	ResNet-18
`resnet50`	ResNet-50
`resnet56`	ResNet-56
`resnet110`	ResNet-110

Note that currently the above resnet models apart from resnet18 haven't been tuned yet, so the results may not optimally represent FSL-SAGE's communication benefits.

Data distribution

Syntax : dataset.distribution=<key>. Determines the distrbution of the dataset across clients List of distributions currently supported:

Key	Distribution
`iid`	homogeneous
`noniid_dirichlet`	heterogeneous

For noniid_dirichlet you can specify the value of alpha using the key dataset.alpha, e.g., dataset.alpha=1.

We also support multiruns in parallel using the hydra-joblib-launcher Thus, it is possible to run multiple experiments for different combinations of hyperparams, models, datasets or algorithms given sufficient GPU memory.

python main.py -m model=resnet18,simple_conv algorithm=fed_avg,sl_single_server,sl_multi_server,cse_fsl,fsl_sage

The above would create parallel jobs that would run main.py on all combinations of specified options. The number of jobs can be controlled by modifying the hydra.launcher.n_jobs option in config.yaml or by specifying hydra.launcher.n_jobs=<jobs> as an option to the script.

Inference Plots

Please check out the readme, the functions used in the plot_results.py and the configs in exp_configs.yaml on how to generate the plots for accuracy, communication load, etc.

Note on source code for LLMs

This code has been extended to allow for LLMs and a basic setup on natural language generation (NLG) using the WebNLG E2E dataset has been tested. We use LoRA fine-tuning on the GPT-2 medium model to learn the E2E task. The code is available on the llm branch of this repository and is built upon the LoRA codebase. The code currently does not use the PEFT or Transformers libraries by HuggingFace, since building model splitting on top of those is challenging. However, contributions to this end on the LLM branch would be welcome. We would like to encourage research on developing automatic model splitters for PyTorch and other popular deep learning frameworks, since these could become increasingly relevant as split learning or federated split learning methods become more popular.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
img		img
inference		inference
saves		saves
src		src
test		test
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
conda_env.yaml		conda_env.yaml
fsl_sage.png		fsl_sage.png
save_commands.sh		save_commands.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Introduction

Requirements

Configuration

Datasets

Training

Inference Plots

Note on source code for LLMs

About

Uh oh!

Releases

Packages

Languages

License

srijith1996/FSL-SAGE

Folders and files

Latest commit

History

Repository files navigation

FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Introduction

Requirements

Configuration

Datasets

Training

Inference Plots

Note on source code for LLMs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages