8000 TrustAIRLab repositories · GitHub

More Web Proxy on the site http://driver.im/

TrustAIRLab

All

26 repositories

SaferVLM
Public
0•0•0•0•Updated Jul 15, 2025Jul 15, 2025
T-GPS
Public
Python
•
Apache License 2.0
•0•2•0•0•Updated Jul 13, 2025Jul 13, 2025
Unsafe-LLM-Based-Search
Public
Python
•
Apache License 2.0
•0•1•0•0•Updated Jun 24, 2025Jun 24, 2025
JailbreakRadar
Public
Python
•6•76•0•0•Updated Jun 8, 2025Jun 8, 2025
AIGT_on_Social_Media
Public
[ACL2025] Official repository for "Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media"
Python
•1•5•0•0•Updated May 29, 2025May 29, 2025
Conversation_Reconstruction_Attack
Public
This is the public code repository for the paper 'Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models'
Python
•1•9•0•0•Updated May 21, 2025May 21, 2025
GPTracker
Public
[S&P'25] GPTracker: A Large-Scale Measurement of Misused GPTs
Python
•
GNU General Public License v3.0
•0•6•0•0•Updated Apr 2, 2025Apr 2, 2025
HateBench
Public
[USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
hatespeech hatespeech-detection llm
Apache License 2.0
•2•7•0•0•Updated Mar 1, 2025Mar 1, 2025
synthetic_artifact_auditing
Public
[Usenix Security 2025] Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
synthetic-data synthetic-dataset-generation llm synthetic-artifact-auditing
Python
•
Apache License 2.0
•0•3•0•0•Updated Jan 29, 2025Jan 29, 2025
proactive_unsafe_generation
Public
[Usenix Security 2025] On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
poisoning-attacks text-to-image-generation unsafe-image
Python
•
Apache License 2.0
•0•2•1•0•Updated Jan 29, 2025Jan 29, 2025
Hateful_Memes_in_VLM
Public
Apache License 2.0
•0•0•0•0•Updated Jan 28, 2025Jan 28, 2025
ModSCAN
Public
An official public repository of the paper "ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities" (https://arxiv.org/abs/2410.06967).
Python
•
MIT License
•1•2•0•0•Updated Jan 8, 2025Jan 8, 2025
ICL-MIA
Public
Python
•0•4•1•0•Updated Dec 19, 2024Dec 19, 2024
importance-in-mlattacks
Public
Python
•0•8•0•0•Updated Dec 18, 2024Dec 18, 2024
SecurityNet
Public
JavaScript
•
MIT License
•0•8•1•0•Updated Oct 30, 2024Oct 30, 2024
ZeroFake
Public
Python
•1•11•1•0•Updated Oct 30, 2024Oct 30, 2024
homepage
Public
JavaScript
•
MIT License
•0•0•0•0•Updated Oct 14, 2024Oct 14, 2024
T2I_Model_Evolution
Public
MIT License
•0•0•0•0•Updated Aug 28, 2024Aug 28, 2024
ML-Doctor
Public
Code for ML Doctor
Python
•
MIT License
•0•6•0•0•Updated Aug 14, 2024Aug 14, 2024
VoiceJailbreakAttack
Public
Code for Voice Jailbreak Attacks Against GPT-4o.
Python
•
MIT License
•1•31•1•0•Updated May 31, 2024May 31, 2024
easy-bib
Public
TeX
•
MIT License
•1•5•0•1•Updated Mar 9, 2024Mar 9, 2024
.github
Public
0•0•0•0•Updated Feb 28, 2024Feb 28, 2024
Label-Only-MIA
Public
Python
•
MIT License
•0•5•0•0•Updated Feb 23, 2024Feb 23, 2024
JailbreakLLMs
Public
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
MIT License
•0•12•0•0•Updated Feb 21, 2024Feb 21, 2024
Link-Stealing-Attack
Public
Python
•0•2•0•0•Updated Feb 21, 2024Feb 21, 2024
MGTBench
Public
Python
•
MIT License
•0•6•0•0•Updated Feb 21, 2024Feb 21, 2024

0