8000 thu-coai repositories · GitHub

More Web Proxy on the site http://driver.im/

thu-coai

All

100 repositories

ShieldVLM
Public
Python
•0•2•0•0•Updated Jul 6, 2025Jul 6, 2025
LongSafety
Public
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
Python
•
MIT License
•0•12•0•0•Updated Jun 18, 2025Jun 18, 2025
SPaR
Public
Python
•
Apache License 2.0
•3•47•0•0•Updated Jun 11, 2025Jun 11, 2025
LRM-Safety-Study
Public
Python
•
MIT License
•1•13•0•0•Updated May 27, 2025May 27, 2025
TransferAttack
Public
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
Python
•0•10•0•0•Updated May 23, 2025May 23, 2025
HPSS
Public
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators (ACL 2025 Findings)
Python
•0•2•0•0•Updated May 23, 2025May 23, 2025
Backdoor-Data-Extraction
Public
Python
•
MIT License
•5•22•1•0•Updated May 22, 2025May 22, 2025
BARREL
Public
Python
•
MIT License
•1•15•0•0•Updated May 21, 2025May 21, 2025
SocialEval
Public
[ACL'25] SocialEval: Evaluating Social Intelligence of Large Language Models
MIT License
•0•2•0•0•Updated May 17, 2025May 17, 2025
Agent-SafetyBench
Public
Python
•
MIT License
•1•38•1•0•Updated May 15, 2025May 15, 2025
AISafetyLab
Public
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
Python
•
MIT License
•10•188•0•0•Updated May 10, 2025May 10, 2025
Crisp
Public
Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
Python
•0•8•0•0•Updated Apr 27, 2025Apr 27, 2025
CharacterBench
Public
[AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
Python
•0•10•0•0•Updated Apr 25, 2025Apr 25, 2025
VPO
Public
Python
•
Apache License 2.0
•1•10•1•0•Updated Mar 26, 2025Mar 26, 2025
MAPS
Public
Official Implementation of ICLR25 paper "MAPS: Advancing Multi-modal Reasoning in Expert-level Physical Science"
Python
•1•4•0•0•Updated Mar 12, 2025Mar 12, 2025
ComplexBench
Public
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
Python
•
MIT License
•11•88•4•0•Updated Feb 20, 2025Feb 20, 2025
CharacterGLM-6B
Public
[EMNLP'24] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
Python
•
Apache License 2.0
•36•468•3•0•Updated Jan 7, 2025Jan 7, 2025
MiniPLM
Public
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
Python
•
MIT License
•9•49•4•0•Updated Nov 23, 2024Nov 23, 2024
MoralStory
Public
Python
•0•17•1•0•Updated Nov 7, 2024Nov 7, 2024
OpenMEVA
Public
Benchmark for evaluating open-ended generation
benchmark evaluation-metrics language-generation
Python
•7•50•3•1•Updated Nov 6, 2024Nov 6, 2024
CodePlan
Public
2•15•1•0•Updated Oct 16, 2024Oct 16, 2024
ShieldLM
Public
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
Python
•
MIT License
•9•201•1•0•Updated Sep 29, 2024Sep 29, 2024
PICL
Public
Code for ACL2023 paper: Pre-Training to Learn in Context
Python
•
MIT License
•4•107•1•1•Updated Jul 26, 2024Jul 26, 2024
PsyQA
Public
一个中文心理健康支持问答数据集，提供了丰富的援助策略标注。可用于生成富有援助策略的长咨询文本。
17•220•0•0•Updated Jul 21, 2024Jul 21, 2024
JailbreakDefense_GoalPriority
Public
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Python
•1•26•0•0•Updated Jul 9, 2024Jul 9, 2024
SafeUnlearning
Public
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Python
•1•29•3•0•Updated Jul 9, 2024Jul 9, 2024
CritiqueLLM
Public
Python
•3•144•6•0•Updated Jul 1, 2024Jul 1, 2024
AutoDetect
Public
Official github repo for AutoDetect, an automated weakness detection framework for LLMs.
Python
•
MIT License
•1•42•0•0•Updated Jun 25, 2024Jun 25, 2024
BPO
Public
Python
•
Apache License 2.0
•15•324•1•0•Updated Jun 24, 2024Jun 24, 2024
SafetyBench
Public
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
Python
•
MIT License
•11•228•5•1•Updated Jun 24, 2024Jun 24, 2024

0