Open
Description
Bug description
For some reason, the tensor parallel implementation generates non-sensical outputs
⚡ python-api-tensor-parallel ~/litgpt litgpt generate_tp checkpoints/microsoft/phi-2
...
Instruct: What food do llamas eat?
Output: When the
.
The first
.
The first
.
Time for inference 1: 1.31 sec total, 15.23 tokens/sec
Expected output (e.g., via base or sequential generation):
Instruct: What food do llamas eat?
Output: Llamas eat grass, shrubs, and other vegetation.
What operating system are you using?
Linux
LitGPT Version
Current main branch