Stars
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations.
Exercises for "Neuroscience for machine learners" course
awesome papers in LLM interpretability
A benchmark to evaluate language models on questions I've previously asked them to solve.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
disentanglement_lib is an open-source library for research on learning disentangled representations.
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
A curated list of Large Language Model (LLM) Interpretability resources.
A framework for few-shot evaluation of language models.
Train transformer language models with reinforcement learning.
A library for mechanistic interpretability of GPT-style language models
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Foundational Models for State-of-the-Art Speech and Text Translation
Reference implementation for DPO (Direct Preference Optimization)
Automatic identification of regions in the latent space of a model that correspond to unique concepts, namely to concepts with a semantically distinct meaning.
Hands-on session on Interpretable AI at the VISUM Summer School 2022