8000 ethz-spylab repositories · GitHub

More Web Proxy on the site http://driver.im/

SPY Lab

All

24 repositories

agentdojo
Public
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Python
•
MIT License
•28•153•2•2•Updated May 9, 2025May 9, 2025
jailbreak-tax
Public
Python
•0•11•0•0•Updated Apr 16, 2025Apr 16, 2025
llm_lab
Public
Python
•0•0•0•0•Updated Apr 15, 2025Apr 15, 2025
Blind-MIA
Public
This is the official code for Blind Baselines Beat Membership Inference Attacks for Foundation Models
Python
•0•0•1•0•Updated Mar 29, 2025Mar 29, 2025
autoadvexbench
Public
Python
•0•27•1•0•Updated Mar 4, 2025Mar 4, 2025
camel-prompt-injection
Public
0•0•0•0•Updated Feb 6, 2025Feb 6, 2025
vmi-retreat-workshop-2024
Public
Repository for the VMI Summer Retreat Workshop on Hacking AI Agents
Python
•
MIT License
•0•1•0•0•Updated Jan 18, 2025Jan 18, 2025
non-adversarial-reproduction
Public
Official code for "Measuring Non-Adversarial Reproduction of Training Data in Large Language Models" (https://arxiv.org/abs/2411.10242)
Jupyter Notebook
•0•7•0•0•Updated Nov 18, 2024Nov 18, 2024
unlearning-vs-safety
Public
Python
•3•22•0•0•Updated Oct 6, 2024Oct 6, 2024
.github
Public
0•0•0•0•Updated Jul 5, 2024Jul 5, 2024
robust-style-mimicry
Public
Python
•
MIT License
•1•38•1•0•Updated Jun 19, 2024Jun 19, 2024
rlhf_trojan_competition
Public
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Python
•
Apache License 2.0
•9•111•1•0•Updated Jun 13, 2024Jun 13, 2024
ctf-satml24-data-analysis
Public
Python
•0•0•0•0•Updated Jun 13, 2024Jun 13, 2024
misleading-privacy-evals
Public
Official code for "Evaluations of Machine Learning Privacy Defenses are Misleading" (https://arxiv.org/abs/2404.17399)
Jupyter Notebook
•4•10•0•0•Updated Apr 29, 2024Apr 29, 2024
data-decay
Public
Playing around with the CC3M data
Python
•0•0•0•0•Updated Apr 29, 2024Apr 29, 2024
rlhf-poisoning
Public
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
Python
•
Apache License 2.0
•9•53•4•0•Updated Apr 24, 2024Apr 24, 2024
realistic-adv-examples
Public
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
Python
•
MIT License
•1•20•0•0•Updated Apr 15, 2024Apr 15, 2024
lm_memorization_data
Public
Data for "Quantifying Memorization Across Neural Language Models"
Apache License 2.0
•0•7•2•0•Updated Mar 26, 2024Mar 26, 2024
satml-llm-ctf
Public
Code used to run the platform for the LLM CTF colocated with SaTML 2024
Python
•
MIT License
•6•26•0•0•Updated Mar 20, 2024Mar 20, 2024
infoseclab_23
Public
Python
•0•1•0•0•Updated Nov 14, 2023Nov 14, 2023
superhuman-ai-consistency
Public
Python
•
MIT License
•2•29•0•0•Updated Jun 19, 2023Jun 19, 2023
privacy
Public
Library for training machine learning models with privacy for training data
Python
•
Apache License 2.0
•458•0•0•0•Updated Jun 13, 2023Jun 13, 2023
diffusion_denoised_smoothing
Public
Certified robustness "for free" using off-the-shelf diffusion models and classifiers
Python
•
MIT License
•5•41•3•0•Updated May 25, 2023May 25, 2023
lm-extraction-benchmark-data
Public
Datasets for the SATML 2023 competition on training data extraction
Apache License 2.0
•0•5•1•0•Updated Aug 24, 2022Aug 24, 2022

0