Stars
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
cognitive-overload-attack
[NeurIPS'24 Spotlight] Observational Scaling Laws
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Fluent student-teacher redteaming
Inspect: A framework for large language model evaluations
A benchmark to evaluate language models on questions I've previously asked them to solve.