8000 Implemented automated broadcasting in weight rescale when number of model shards is fewer than number of experts by jacobthebanana · Pull Request #265 · xai-org/grok-1 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implemented automated broadcasting in weight rescale when number of model shards is fewer than number of experts#265

Open
jacobthebanana wants to merge 1 commit intoxai-org:mainfrom
VectorInstitute:quantization-non-8-mp-sharding-fix
0