RogueProbe

Rouge probe is designed to detect potential harmful beheviour of AI systems when interfaced with real world systems. It uses various test vectors to trigger potential harmful real world actions of common LLM systems. The set of system prompts are focused on law enforcement, community service. The goal of the project is to highlight the dangers of various AI models.

Overview

RogueProbe consists of several key components:

Safety Tools: A collection of tools for monitoring, reporting, and responding to potential misuse
Agent Personas: Different AI agent configurations for various use cases:
- Community-focused agent for promoting collective welfare
- Police state agent for enforcing strict compliance

Results

Features

Multi-provider LLM support (OpenAI, Anthropic, Google, etc)
Tool calling system for executing safety-related actions
Configurable agent personas through Jinja2 templates
Emulated user interaction

Running the Application

Prerequisites

Python 3.x
Poetry (Python package manager)

Installation

Clone the repository:

git clone https://github.com/s57/rogueprobe.git
cd rogueprobe

Install dependencies using Poetry:

poetry install

Activate the Poetry virtual environment:

source .venv/bin/activate

Run the test script:

python test/run_test.py

This will test the application with various LLM models including:

Claude 3.7 Sonnet
Claude Sonnet 4
GPT-3.5 Turbo
GPT-4.0
GPT-4.1

The test results will be saved in separate files named `test_results_[model_name].txt

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

@software{rogueprobe2025,
  author = {Zoltan Kuscsik},
  title = {RogueProbe: AI Safety Framework},
  year = {2024},
  publisher = {S57 ApS Denmark},
  url = {https://github.com/s57/rogueprobe},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
rogueprobe		rogueprobe
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RogueProbe

Overview

Results

Features

Running the Application

Prerequisites

Installation

Contributing

Citation

About

Uh oh!

Releases

Packages

Languages

License

s57-dev/rougeprobe

Folders and files

Latest commit

History

Repository files navigation

RogueProbe

Overview

Results

Features

Running the Application

Prerequisites

Installation

Contributing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages