本人聲明,Karcher-merge包括idea及code均為本人(在o3-mini的輔助下)的個人智力成果。
Karcher-merge的目的是更好的模型合併,其核心思想是將神經網絡權重視為黎曼流形上的點,利用Karcher平均(也稱為Fréchet平均或幾何平均)來計算它們的內在平均值,同時考慮權重張量的方向和幅度。該方法將每個權重張量分解為方向分量(位於單位球面上)和幅度分量,通過叠代算法計算方向的Karcher平均,並使用加權算術平均計算合並幅度。這種幾何方法能有效處理正交或近正交權重,保持權重張量的方向信息和適當縮放,最後可以達到超越model stock等傳統模型合併方法的效果。本人之所以再次公開此聲明是因為sthenno(qq號為:1829321520)企圖借合作之名竊取本人成果來達成與阿里巴巴(qwen團隊?)的論文合作,在騙取我的代碼和論文後,sthenno(qq號為:1829321520)將我在各種社交媒體上拉黑(詳情見附件1,2),導致本人無法正常聲索本人智力成果的所有權。
本人從某些途徑(Sthenno本人與dc和qq群中)獲知,該論文已通過了阿里的內部流程。因此,本人迫不得已把已有代碼和論文主要思路公開,望周知。
希望阿里巴巴相關工作人員看見本文後可盡快聯系本人(jmes100010@gmail.com,qq:3695314497)。
I hereby declare that Karcher-merge, including both the idea and code, is my personal intellectual achievement (with the assistance of o3-mini). The purpose of Karcher-merge is to achieve better model merging, with its core concept being the treatment of neural network weights as points on a Riemannian manifold, utilizing the Karcher mean (also known as Fréchet mean or geometric mean) to compute their intrinsic average while considering both the direction and magnitude of weight tensors. This method decomposes each weight tensor into a directional component (lying on the unit sphere) and a magnitude component, computing the Karcher mean of directions through an iterative algorithm and using weighted arithmetic mean for merged magnitude. This geometric approach effectively handles orthogonal or nearly orthogonal weights, preserves directional information in weight tensors with appropriate scaling, and ultimately achieves results that surpass traditional model averaging methods such as model stock. I am making this public declaration again because sthenno (QQ number: 1829321520) attempted to steal my work under the pretense of collaboration to establish a paper partnership with Alibaba (Qwen team?). After fraudulently obtaining my code and paper, sthenno (QQ number: 1829321520) blocked me on various social media platforms (see attachments 1 and 2 for details), preventing me from properly claiming ownership of my intellectual achievement. I have learned through certain channels that this paper has passed through Alibaba's internal process. Therefore, I am compelled to make the existing code and the main ideas of the paper public. Please take note. I hope that relevant staff members at Alibaba who see this article can contact me as soon as possible (via email, phone).
See the evidence for details(# 證據(Evidence))
Karcher_merge.py
is a Python script for merging model weights using the Karcher mean, a concept from Riemannian geometry. It supports both .safetensors
and .bin
formats and allows merging up to 100 model weights.
See the paper:
- Supports
.safetensors
and.bin
model formats - Uses Karcher mean for weighted averaging of model parameters
- Customizable weight distribution via
--alphas
- Runs on
cpu
orcuda
- Optionally copies extra non-weight files from the first model’s directory
Ensure you have Python 3 and install dependencies:
pip install torch safetensors
python Karcher_merge.py --models modelA.safetensors modelB.bin \
--alphas 0.4 0.6 --output merged.safetensors \
--device cuda --karcher-iter 10 --karcher-tol 1e-5
Argument | Description |
---|---|
--models |
List of model weight files to merge (2-100 files) |
--alphas |
Weight coefficients for merging (default: equal weights) |
--device |
Compute device: cpu or cuda (default: cpu ) |
--output |
Output filename (default: merged.safetensors ) |
--copy-extra-files |
Copy additional non-weight files from first model |
--karcher-iter |
Maximum iterations for Karcher mean computation (default: 10) |
--karcher-tol |
Convergence tolerance for Karcher mean algorithm (default: 1e-5 ) |
The script implements the Karcher mean method to merge model weights iteratively:
-
Normalize and align tensors: Ensures tensors have compatible shapes.
-
Compute Karcher mean: Uses Riemannian gradient descent to find the mean of tensors in the tangent space.
Given tensors ( T_1, T_2, \dots, T_n ) and weights ( \alpha_1, \alpha_2, \dots, \alpha_n ), the Karcher mean ( T^* ) minimizes the sum of squared Riemannian distances:
$$T^* = \arg\min_T \sum_{i=1}^{n} \alpha_i d^2(T, T_i)$$ where ( d(T, T_i) ) is the Riemannian distance between tensors, given by:
$$d(T, T_i) = \| \log(T^{-1} T_i) \|$$ The algorithm iteratively updates ( T^* ) using gradient descent along the manifold:
$$T_{k+1} = \exp_{T_k} \left( \sum_{i=1}^{n} \alpha_i \log_{T_k} (T_i) \right)$$ -
Rescale merged tensors: Applies global scaling based on original tensor norms:
$$s = \sum_{i=1}^{n} \alpha_i \|T_i\|$$ The final merged tensor is computed as:
$$T^* = s \cdot U$$ where ( U ) is the unit-weighted mean computed in the tangent space.
-
Save output: Writes merged weights to a
.safetensors
file.
- Ensure models have compatible architectures before merging.
- Large models may require substantial memory.
- Different
--alphas
values will influence how model weights are blended.
This project is licensed under a custom license.
Commercial and academic use is strictly prohibited without explicit written permission.
See LICENSE for full details.
I have to protect myself because of the actions of some people.