8000 Implemented automated broadcasting in weight rescale when number of model shards is fewer than number of experts by jacobthebanana · Pull Request #265 · xai-org/grok-1 · GitHub

More Web Proxy on the site http://driver.im/

Implemented automated broadcasting in weight rescale when number of model shards is fewer than number of experts#265

Open

jacobthebanana wants to merge 1 commit intoxai-org:mainfrom

VectorInstitute:quantization-non-8-mp-sharding-fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented automated broadcasting in weight rescale when number of model shards is fewer than number of experts#265

Commits on Mar 21, 2024

Implemented automated broadcasting in weight rescale when number of model shards is less than number of experts.