PhD student working on AI safety at @ethz-spylab, ETH Zürich.
https://lukas-fluri.com
-
ETH Zurich
- Zürich
-
23:43
(UTC +02:00)
Pinned Loading
-
MisleadLM
MisleadLM PublicForked from Jiaxin-Wen/MisleadLM
A re-implementation and extension of the code of the paper: "Language Models Learn to Mislead Humans via RLHF""
Python
-
Targeted-Manipulation-and-Deception-in-LLMs
Targeted-Manipulation-and-Deception-in-LLMs PublicForked from marcus-jw/Targeted-Manipulation-and-Deception-in-LLMs
A benchmark for evaluating the tendency of LLM agents to influence human preferences
Python
-
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.