-
METR
- California
- https://www.lesswrong.com/users/thomas-kwa
-
-
-
-
catastrophic-goodhart Public
Plots and empirical results for Catastrophic Goodhart https://www.lesswrong.com/s/6rhjdbnEXoek4YiH7
TeX UpdatedOct 25, 2024 -
TurnTrout.com-fork Public
Forked from alexander-turner/TurnTrout.comA blog on AI, personal development, and living a good life.
TypeScript Other UpdatedOct 23, 2024 -
-
alignment-evals Public
Some evals for alignment, consequentialism, and corrigibility of LLMs
UpdatedOct 1, 2024 -
OpenRLHF Public
Forked from OpenRLHF/OpenRLHFAn Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Python Apache License 2.0 UpdatedAug 9, 2024 -
sae-enhanced-cd Public
Replication of the paper "Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models" (https://arxiv.org/pdf/2405.12522)
-
turntrout-plots Public
Data files from Alex Turner's experiments and posts
Jupyter Notebook UpdatedJun 10, 2024 -
iit Public
A replication and extension of the paper "Inducing Causal Structure for Interpretable Neural Networks" by Atticus Geiger
Python UpdatedMay 8, 2024 -
-
-
acdc-adria Public
Forked from rhaps0dy/Automatic-Circuit-DiscoveryJupyter Notebook MIT License UpdatedJan 26, 2024 -
katago_retarget Public
Retarget KataGo to output the worst move by flipping activations.
Python UpdatedNov 14, 2023 -
ShortcutBadger Public
Forked from leolin310148/ShortcutBadgerAn Android library supports badge notification like iOS in Samsung, LG, Sony and HTC launchers.
Java Other UpdatedSep 14, 2023 -
Automatic-Circuit-Discovery Public
Forked from ArthurConmy/Automatic-Circuit-DiscoveryJupyter Notebook MIT License UpdatedAug 31, 2023 -
algebraic_value_editing Public
Forked from montemac/activation_additionsExperiments testing the algebraic value-editing conjecture (AVEC) on GPT-2 models
Jupyter Notebook MIT License UpdatedJun 9, 2023 -
othello-gpt-ideas Public
Submission to Neel Nanda's 2022 SERI MATS stream.
-
nonsurrounding-polyomino Public
Finding a polyomino that cannot surround a 1x1 square, using the ORTools SAT solver.
Python UpdatedMar 26, 2023 -
-
-
-
legion Public
Forked from StanfordLegion/legionThe Legion Parallel Programming System
C++ Apache License 2.0 UpdatedJul 20, 2020 -
exist-mood-import Public
Forked from jjst/exist-mood-importImport scripts for existing mood tracking app data
Python MIT License UpdatedJun 21, 2020 -
Sledgehammer Public
A code-golf language written in Mathematica
-
-