kamilelukosiute

Kamile Lukosiute kamilelukosiute

Stars

emergent-misalignment / emergent-misalignment

Python 148 51 Updated Mar 7, 2025

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 8,725 1,105 Updated May 15, 2025

UNHSAILLab / working-memory-attack-on-llms

cognitive-overload-attack

Jupyter Notebook 14 4 Updated Mar 11, 2025

ryoungj / ObsScaling

[NeurIPS'24 Spotlight] Observational Scaling Laws

Jupyter Notebook 54 3 Updated Oct 2, 2024

andyrdt / refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 218 50 Updated Oct 1, 2024

ReversecLabs / damn-vulnerable-llm-agent

Python 284 59 Updated Dec 29, 2023

Confirm-Solutions / flrt

Fluent student-teacher redteaming

Jupyter Notebook 20 4 Updated Jul 25, 2024

UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations

Python 955 227 Updated May 16, 2025

carlini / yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Python 1,009 77 Updated Apr 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly