ai-interpretability

Here are 3 public repositories matching this topic...

I Asked It to Forget, but It Didn't — A Case of Miscommunication Between AI and Humans

Probing linguistic robustness in transformers: a quantum-inspired approach to AI interpretability

📦 Redwood Research's transformer interpretability tools, conveniently packaged in a Docker container for simple and reproducible deployments.

docker ai ai-safety redwood-research ai-interpretability

Add a description, image, and links to the ai-interpretability topic page so that developers can more easily learn about it.

To associate your repository with the ai-interpretability topic, visit your repo's landing page and select "manage topics."