Description
ds_config_zero3.json LICENSE main.py README.md scripts src train.py │··········································································································
(transmla) lixishi@node01:~/TransMLA$ python main.py --model-path /home/lixishi/llm_model/Llama-2-7b-hf/ --ppl-eva│··········································································································
l-batch-size 8 --dim2head 4 --qk-mqa-dim 128 --q-lora-rank 512 --kv-lora-rank 896 │··········································································································
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.32s/it]│··········································································································
100%|██████████████████████████████████████████████████████████████████████████████| 16/16 [00:20<00:00, 1.28s/it]│··········································································································
++++++++++Original Model:++++++++++ │··········································································································
100%|██████████████████████████████████████████████████████████████████████████████| 21/21 [01:16<00:00, 3.64s/it]│··········································································································
Original ppl: 5.4734 │··········································································································
++++++++++RemoveRope Model:++++++++++ │··········································································································
100%|██████████████████████████████████████████████████████████████████████████████| 16/16 [00:21<00:00, 1.37s/it]│··········································································································
100%|██████████████████████████████████████████████████████████████████████████████| 21/21 [02:32<00:00, 7.27s/it]│··········································································································
Remove RoPE ppl: 8.9670 │··········································································································
++++++++++LoraQKV Model:++++++++++ │··········································································································
Traceback (most recent call last): │··········································································································
File "/home/lixishi/TransMLA/main.py", line 111, in │··········································································································
main(args) │··········································································································
File "/home/lixishi/TransMLA/main.py", line 86, in main │··········································································································
setattr(layer, "self_attn",LoraQKV( │··········································································································
^^^^^^^^ │··········································································································
File "/home/lixishi/TransMLA/src/lora_qkv.py", line 103, in init │··········································································································
self.init_deepseek(self_attn, R_q, R_kv) │··········································································································
File "/home/lixishi/TransMLA/src/lora_qkv.py", line 108, in init_deepseek │··········································································································
q_a_weight = (R_q.T@self_attn.q_proj.weight.data.to(torch.float64))[:self.q_lora_rank].to(self.dtype) │··········································································································
^^^^^ │··········································································································
AttributeError: 'NoneType' object has no attribute 'T'