8000 luk-s (Lukas Fluri) · GitHub

More Web Proxy on the site http://driver.im/

luk-s

Follow

Lukas Fluri luk-s

Follow

PhD student working on AI safety at @ethz-spylab, ETH Zürich. https://lukas-fluri.com

3 followers · 10 following

ETH Zurich
Zürich
23:43 (UTC +02:00)

Achievements

Achievements

Organizations

Pinned Loading

MisleadLM MisleadLM Public

Forked from Jiaxin-Wen/MisleadLM

A re-implementation and extension of the code of the paper: "Language Models Learn to Mislead Humans via RLHF""

Python
Targeted-Manipulation-and-Deception-in-LLMs Targeted-Manipulation-and-Deception-in-LLMs Public

Forked from marcus-jw/Targeted-Manipulation-and-Deception-in-LLMs

A benchmark for evaluating the tendency of LLM agents to influence human preferences

Python
rl-testing-experiments rl-testing-experiments Public

Python 1
ethz-spylab/superhuman-ai-consistency ethz-spylab/superhuman-ai-consistency Public

Python 29 2

0