-
Understand and distinguish these concepts:
- Sparsification
- Pruning
- Quantization
- Distillation
- MoEfication
-
Choose your models. Pick 3 models, 1 from each category. Each pick should be of more than 1B parameters before pruning.
- Encoder-only
- Decoder-only
- Encoder-Decoder
You can find info about model size at https://openbmb.github.io/BMList/list/. You may use huggingface or other modelhub that you see fit.
-
Devise approaches to assess sparsity structure in your choice of models and answer these questiosn:
- what fraction of parameters >> 0? overall? by layer?
- how does this vary by layer?
-
Produce sparsified versions of your models at 10%, 50%, 90%, 95%, 99%, by either coding your methods or using existing tools provided below Explain the nature of your methods, regardless of whether you code it yourselves.
-
Find 2 common benchmarks used by your models, by reviewing their publications.
Set them up and obtain baseline results of original models.
Compare performance of your sparsified versions with the baselines. Include plots and explanations. -
Compare size of models and runtime for sparsified models. Include plots and explanations.
-
Explain the challenges of sparsification on LLMs.
- Due: Nov 9th, 12 PM CST
- Fork your public Github repository, change the repo name to
llm-sparsification-<cnetid>
- we will look out for the following files:
report.md
src/*
requirements.txt
forpip
orenvironment.yml
forconda
- any jupyter notebooks