Distilled Lifelong Self-Adaptation for Configurable Systems

Modern configurable systems provide tremendous opportunities for engineering future intelligent software systems. A key difficulty thereof is how to effectively self-adapt the configuration of a running system such that its performance (e.g., runtime and throughput) can be optimized under time-varying workloads. This unfortunately remains unaddressed in existing approaches as they either overlook the available past knowledge or rely on static exploitation of past knowledge without reasoning the usefulness of information when planning for self-adaptation. In this paper, we tackle this challenging problem by proposing DLiSA, a framework that self-adapts configurable systems. DLiSA comes with two properties: firstly, it supports lifelong planning, and thereby the planning process runs continuously throughout the lifetime of the system, allowing dynamic exploitation of the accumulated knowledge for rapid adaptation. Secondly, the planning for a newly emerged workload is boosted via distilled knowledge seeding, in which the knowledge is dynamically purified such that only useful past configurations are seeded when necessary, mitigating misleading information.

Extensive experiments suggest that the proposed DLiSA significantly outperforms state-of-the-art approaches, demonstrating a performance improvement of up to 229% and a resource acceleration of up to 2.22x on generating promising adaptation configurations.

This repository contains the key codes, full data used, and raw experiment results for the paper.

Documents

dataset

The dataset folder contains configuration datasets for the following 9 subject systems, each stored in a separate subfolder. The name of each subfolder corresponds to the system it represents.

System	Language	Domain	Performance	#O	#C	#W	Optimization Goal
JUMP3R	Java	Audio Encoder	Runtime	16	4196	6	Minimization
KANZI	Java	File Compressor	Runtime	24	4112	9	Minimization
DCONVERT	Java	Image Scaling	Runtime	18	6764	12	Minimization
H2	Java	Database	Throughput	16	1954	8	Maximization
BATLIK	Java	SVG Rasterizer	Runtime	10	1919	11	Minimization
XZ	C/C++	File Compressor	Runtime	33	1999	13	Minimization
LRZIP	C/C++	File Compressor	Runtime	11	190	13	Minimization
X264	C/C++	Video Encoder	Runtime	25	3113	9	Minimization
Z3	C/C++	SMT Solver	Runtime	12	1011	12	Minimization

#O: Number of options.
#C: Number of configurations.
#W: Number of workloads tested.
Optimization Goal: Whether the goal is minimization (e.g., reducing runtime) or maximization (e.g., increasing throughput).

Each system folder contains 6 ~ 13 workloads, depending on the system. The data is stored in .csv format, where each CSV file follows the below structure:

Columns (1 to n-1): Configuration options, which can be either discrete or continuous values.
Column n: Performance objective, represented as a numeric value (e.g., runtime, throughput).

order_files

The order_files folder contains the execution order of workloads for each system over 100 independent runs.

results

The results folder contains the optimization results for each system. Each system has its own subfolder where the corresponding optimization outcomes are stored. Within each system's folder, the tuning_results directory stores the detailed optimization results for all compared algorithms, including both state-of-the-art baselines and our proposed method.

Algorithm Comparison

D-SOGA, FEMOSAA, LiDOS, Seed-EA: State-of-the-art (SOTA) baseline algorithms for comparison.
DLiSA-0.x: Our proposed DLiSA method, evaluated with different hyperparameter $\alpha$ settings.
DLiSA-I, DLiSA-II: Two ablation study variants of DLiSA.

Folder Structure and Example

Each algorithm folder contains results from 100 independent runs, where the optimization process is stored in separate directories:

results/
│── batlik/
│   ├── tuning_results/
│   │   ├── D-SOGA/
│   │   │   ├── optimized_pop_config_run_0/
│   │   │   │   ├── corona_config.csv
│   │   │   │   └── corona_indices.csv
│   │   │   ├── optimized_pop_perf_run_0/
│   │   │   │   ├── corona_perf.csv
│   │   │   │   ├── corona_time.csv
│   │   │   │   └── evolutionary_process_in_corona/
│   │   │   │   │   ├── generation_0_indices.csv 
│   │   │   │   │   ├── generation_0_config.csv
│   │   │   │   │   ├── generation_0_perf.csv
│   │   │   │   │   ├── ...

Directory Breakdown

optimized_pop_config_run_{run}/: Stores the optimized configurations for the given run.
- corona_config.csv stores the configuration details, while corona_indices.csv stores additional indices or data related to the configurations.
optimized_pop_perf_run_{run}/: Stores the corresponding performance results for the optimized configurations in the corresponding run.
- corona_perf.csv stores the performance of the optimized configuration, while corona_time.csv records execution time of the algorithm for processing corona workload.
evolutionary_process_in_corona/: Stores the intermediate data for each generation during the optimization process. This folder contains the data of all generations throughout the evolutionary process for the corona workload.
- generation_0_config.csv stores the configurations for specified generation.
- generation_0_perf.csv stores the corresponding performance values for the configurations.
- generation_0_indices.csv stores additional indices or metadata associated with the configuration.

Since all algorithms are population-based methods, the results are stored in .csv format, representing the final optimized population.

statistical tool

The statistical tool folder contains scripts for conducting statistical analyses related to all research questions (RQs) and their corresponding results.

supplementary file

The supplementary file contains the specific workloads for 9 subjected systems and detailed experimental results for RQ2 and RQ3.

python files

main.py: the main program for using DLiSA, which automatically reads data from csv files, implements Knowledge Distillation and Evolutionary Planning, and saves the corresponding results and data.
Adaptation_Optimizer.py the program for running Knowledge Distillation.
Genetic_Algorithm.py the program for running Evolutionary Planning
requirements.txt the required packages for running main.py.

Prerequisites and Installation

Download all the files into the same folder/clone the repository.
Install the specified version of Python:
- The codes have been tested and is recommended to run with Python 3.9, Other versions might cause errors.
- Using other versions may require adjusting dependency versions; otherwise, package compatibility issues may occur.
Using the command line: cd to the folder with the codes, and install all the required packages by running:
```
 pip install -r requirements.txt
```
Install R: R is required to run the experiments and version 4.4 or later is recommended. Please follow the installation instructions for your operating system at R Project.
The devtools package is required for running the statistical analysis scripts (RQ1 and RQ4). Our scripts dynamically check for devtools package and install it automatically if it is missing. This ensures that users can run the scripts without additional setup. However, if the automatic installation fails due to system-specific configurations, users can manually install devtools in R by running:
```
 ```r
 install.packages("devtools")
```

For Ubuntu users, the following Debian packages should be installed before running the statistical analysis script (e.g., RQ1_Scott-Knott.py):

 ```bash
 sudo apt-get update && sudo apt-get install -y \
 libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev \
 libharfbuzz-dev libfribidi-dev libfontconfig1-dev \
 libcurl4-openssl-dev

Run DLiSA

Command line: cd to the folder with the codes, input the command below, and the rest of the process will fully automated.
```
python main.py
```
Python IDE (e.g., Pycharm): Open the main.py file on the IDE, and simply click 'Run'.

Main Program Overview

The main.py script is structured with several key components:

Parameters Settings: Configure essential parameters like the number of independent runs, maximum generations for GA, and population size.
Systems: Define the list of systems to be optimized.
Compared Algorithms: Specify which algorithms to compared for system performance tuning.
Optimization Goal: Set the optimization objective for each (e.g., maximize throughput or minimize runtime).

Example Setup for a Quick Demo

Modify these parameters in main.py to suit your optimization needs:

run: Number of independent optimization runs (e.g., run = 100).
max_generation: Maximum number of generations for the evolutionary process (e.g., max_generation = 3).
pop_size: Population size for the Genetic Algorithm (e.g., pop_size = 20).
mutation_rate: Probability of mutation in the evolutionary process (e.g., mutation_rate = 0.1).
crossover_rate: Probability of crossover (e.g., crossover_rate = 0.9).
systems: List of systems for optimization (e.g., systems = ['batlik', 'kanzi']).
compared_algorithm: Algorithms for comparison (e.g.,compared_algorithm = ['FEMOSAA', 'Seed-EA', 'D-SOGA', 'DLiSA-0.3'] )

Evaluation

After setting the necessary parameters and successfully running the project, statistical analysis scripts in the statistical tool folder can be used to compare the performance of all algorithms.

RQ1: How effective is DLiSA against state-of-the-art approaches?

The script statistical tool/RQ1_Scott-Knott.py performs Scott-Knott ranking analysis to evaluate the performance of different algorithms on the specified systems. The results are stored in the statistical tool/RQ1/ directory, with each system's ranking saved in a CSV file: statistical tool/RQ1/{system_name}.csv

Stored Format

Each CSV file follows the structure shown below:

Workload	Algorithm	r	Mean (Std)
corona	FEMOSAA	3	0.953 (0.043)
corona	Seed-EA	2	0.912 (0.031)
corona	D-SOGA	2	0.912 (0.024)
corona	LiDOS	2	0.916 (0.025)
corona	DLiSA-0.3	1	0.907 (0.014)

Column r: Represents the ranking of each algorithm, where rank 1 indicates the best performance.
Column Mean (Std): Displays the mean performance value across 100 independent runs, along with the standard deviation in parentheses.

Expected Outcome

The primary goal of our study is to maximize the frequency at which DLiSA ranks 1, demonstrating its superior performance compared to SOTA algorithms. In order to make a more intuitive comparison, RQ1_Scott-Knott.py also prints the number of times each algorithm achieves rank 1 directly in the console. This allows for an easy assessment of which algorithm performs best across different workloads and systems. A simple snapshot of the execution is shown below:

By implementing this script, you can quantitatively compare optimization methods and validate the effectiveness of DLiSA against SOTA algorithms.

RQ2: How efficient is DLiSA compared with others?

The script statistical tool/RQ2_efficiency_analysis.py performs convergence analysis to evaluate the efficacy of different algorithms on the specified systems. Please refer to the paper for the detailed calculation process. The results are stored in the statistical tool/RQ2/ folder.

Stored Format

The results are stored in CSV files, comparing the efficiency (convergence speed) of DLiSA against baseline algorithms on different systems. For example, the file: convergence_kanzi_DLiSA-0.3_vs_FEMOSAA.csv compares DLiSA-0.3 (DLiSA with parameter $\alpha$=0.3) and FEMOSAA on the Kanzi system across different workloads.

workload	baseline_b	baseline_T	compared_m	s
ambivert	80	1.8298	39	2.051282051
artificl	80	0.1657	39	2.051282051
deepfield	80	0.2926	—	-1
enwik8	80	2.6246	38	2.105263158
fannie_mae_500k	80	1.8841	38	2.105263158
large	80	0.6867	39	2.051282051
misc	80	0.2426	39	2.051282051
silesia	80	5.3858	38	2.105263158
vmlinux	80	1.0921	39	2.051282051

Column Explanation

workload: The tested workload in the Kanzi system.
baseline_b: A baseline, b, is identified for the compared algorithm, representing the smallest number of measurements necessary for it to reach its best performance, i.e., baseline_T, averaging over 100 runs.
baseline_T: The baseline algorithm’s best performance, averaging over 100 runs.
compared_m: The smallest number of measurements required by the DLiSA to get a result that is equivalent or better than baseline_T.
s: The speedup of DLiSA over its counterpart, where s=-1 indicates that convergence was not achieved.
- In the paper, we use s = N/A to represent cases where DLiSA cannot achieve the baseline_T reached by its counterpart.

RQ3: What benefits do ranked workload similarity analysis and weighted configuration seeding each provide?

The script statistical tool/RQ3_Non-parametric.py performs non-parametric test and effect size test for pairwise comparisons between DLiSA and its variants. The results are stored in the statistical tool/RQ3/ folder.

Stored Format

The results are stored in a CSV file with the following columns:

Environment	DLiSA-0.3_Mean	DLiSA-I_Mean	Wilcoxon_p	A12	Label
beethoven.wav	2.5731	2.6442	0.02288209	0.59295	1
dual-channel.wav	0.8462	0.9274	0.009418769	0.6062	1
helix.wav	1.3088	1.4313	0.006757337	0.6108	1
single-channel.wav	0.6425	0.6778	0.04459703	0.582	1
speech.wav	1.0449	1.1274	0.000527182	0.6416	1
sweep.wav	0.2975	0.3074	0.002268491	0.62165	1

Environment: Represents different workloads tested.
{algorithm}mean: Represents the mean value of 100 independent runs for each algorithm (e.g., DLiSA-0.3_Mean, DLiSA-I_Mean).
Wilcoxon_p: p-value from the Non-parametric test indicating whether the difference between algorithms is statistically significant.
A12: Represents the effect size indicating the strength of the algorithm's performance difference.
Label: The label is 1 when the performance of DLiSA is better than its variants under the conditions:
- A12 >= 0.56
- Wilcoxon_p <= 0.05

Expected Outcome

We expect that when the A12 value is greater than or equal to 0.56 and the Wilcoxon p-value is less than or equal to 0.05, the performance of DLiSA is better than that of the comparison algorithms. In such cases, the Label will be marked as 1. For detailed results, refer to this link.

RQ4: How does $\alpha$ affect DLiSA’s performance?

The script statistical tool/RQ4_Scott-Knott.py performs Scott-Knott ranking analysis to evaluate the performance of DLiSA with different parameter $\alpha$. The results are stored in the statistical tool/RQ4/ directory, with each system's ranking saved in a CSV file: statistical tool/R4/{system_name}.csv.

Note on Reproducibility

Since the workload order is randomly shuffled across 100 independent runs, and the optimization algorithms used are population-based methods, the initial population may vary in each execution. As a result, the reproduced results may exhibit slight statistical fluctuations compared to the reported data in the paper. However, these variations should not affect the overall conclusions of the study.

Execution Time Estimation

Running main.py typically takes around 3 hours to complete.
The evaluation scripts for RQ1 ~ RQ4 in the statistical tool folder generally take only a few minutes each.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.idea		.idea
dataset		dataset
order_files		order_files
results		results
statistical tool		statistical tool
supplementary file		supplementary file
.DS_Store		.DS_Store
.Rhistory		.Rhistory
Adaptation_Optimizer.py		Adaptation_Optimizer.py
Artifact.zip		Artifact.zip
Genetic_Algorithm.py		Genetic_Algorithm.py
ICSE_DLiSA.pdf		ICSE_DLiSA.pdf
LICENSE.txt		LICENSE.txt
README.md		README.md
RQ1-snapshot.jpg		RQ1-snapshot.jpg
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distilled Lifelong Self-Adaptation for Configurable Systems

Documents

dataset

order_files

results

Algorithm Comparison

Folder Structure and Example

Directory Breakdown

statistical tool

supplementary file

python files

Prerequisites and Installation

Run DLiSA

Main Program Overview

Example Setup for a Quick Demo

Evaluation

RQ1: How effective is DLiSA against state-of-the-art approaches?

Stored Format

Expected Outcome

RQ2: How efficient is DLiSA compared with others?

Stored Format

Column Explanation

RQ3: What benefits do ranked workload similarity analysis and weighted configuration seeding each provide?

Stored Format

Expected Outcome

RQ4: How does $\alpha$ affect DLiSA’s performance?

Note on Reproducibility

Execution Time Estimation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

License

ideas-labo/dlisa

Folders and files

Latest commit

History

Repository files navigation

Distilled Lifelong Self-Adaptation for Configurable Systems

Documents

dataset

order_files

results

Algorithm Comparison

Folder Structure and Example

Directory Breakdown

statistical tool

supplementary file

python files

Prerequisites and Installation

Run DLiSA

Main Program Overview

Example Setup for a Quick Demo

Evaluation

RQ1: How effective is DLiSA against state-of-the-art approaches?

Stored Format

Expected Outcome

RQ2: How efficient is DLiSA compared with others?

Stored Format

Column Explanation

RQ3: What benefits do ranked workload similarity analysis and weighted configuration seeding each provide?

Stored Format

Expected Outcome

RQ4: How does $\alpha$ affect DLiSA’s performance?

Note on Reproducibility

Execution Time Estimation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Packages