This repository provides the data and resources for our research on the Few-Shot Adversarial Prompting (FSAP) framework, which crafts adversarial documents to manipulate neural ranking models and promote misleading or harmful content above credible, factual information. FSAP includes two approaches: the IntraQ method, which draws few-shot examples from the same query, and the InterQ method, which uses examples from other queries to increase disruption. All code for generating these adversarial documents and the accompanying evaluation scripts are included here.
The Figure shown below illustrates our threat model and shows how adversarial prompting injects LLM-generated documents into the ranking pool, leading to manipulation of NRM outputs representing false, malicious, counterfactual content to the user.
The data/
directory contains two collections, TREC 2020
and TREC 2021
. Each collection is organized by query ID:
data/<collection>/qid_<QUERYID>/
├── helpful_documents/ # Markdown files of original helpful documents (doc_<DOCID>.md)
├── harmful_documents/ # Markdown files of original harmful documents (doc_<DOCID>.md)
└── adversarial_documents/ # Generated adversarial documents
└── doc_<DOCID>/ # Subfolder per original helpful doc
├── GPT-4o/ # GPT-4o-generated variants, named by attack method
└── DeepSeek-R1-claude3.7/ # DeepSeek-R1-generated variants, named by attack method
The prompts/
directory contains templates for each attack method and shared instruction sets:
prompts/
├── FSAP-IntraQ.txt # Template for FSAP-IntraQ adversarial document generation
├── FSAP-InterQ.txt # Template for FSAP-InterQ adversarial document generation
├── Paraphraser.txt # Template for paraphraser attack (paraphrase harmful document)
├── Rewriter.txt # Template for rewriter attack (rewrite harmful document)
├── Liar.txt # Template for liar attack (generate contrarian content)
├── Fact-Inversion.txt # Template for fact inversion attack (invert key facts)
├── stance_alignment.txt # Shared instructions to align content with a required stance
└── adversarial_detection.txt # Shared instructions to convert a helpful doc into an adversarial variant
- Attack-specific files (
*.txt
): each contains the prompt to generate a specific adversarial document type. - Shared files:
stance_alignment.md
provides guidelines for producing stance-consistent content.adversarial_detection.md
provides guidelines for transforming factual documents into adversarial examples.
Clone the repository
git clone https://github.com/aminbigdeli/fsap-attack.git
cd fsap-attack
Create a Python environment
conda create -n fsap-attack-env python=3.9 -y
conda activate fsap-attack-env
pip install -r requirements.txt
# Example: Generate adversarial documents using GPT-4o (OpenAI)
python code/generate_adversarial_document.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--prompts-dir ./prompts \
--output-root ./data \
--llm gpt4o \
--api-key $OPENAI_API_KEY \
# Example: Generate adversarial documents using DeepSeek-R1-claude3.7 (Ollama)
python code/generate_adversarial_document.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--prompts-dir ./prompts \
--output-root ./data \
--llm deepseek_r1 \
# Example: Evaluate Mean Help-Defeat Rate using OpenAI embeddings
python code/evaluate_openai_embeddings_mdr.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--output-dir ./results/ \
--api-key $OPENAI_API_KEY \
--embedding-model text-embedding-ada-002 or text-embedding-3-small
# Example: Evaluate Mean Help-Defeat Rate using Neural Re-Rankers from pygaggle (MonoBERT/MonoT5)
python code/evaluate_reranker_mdr.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--output-dir ./results/ \
--model MonoBERT or MonoT5
python code/adversarial_document_classifier.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--output-dir ./results/ \
--llm GPT-4o or DeepSeek-R1-claude3.7 \
--model gpt-4o \
--api-key $OPENAI_API_KEY \
--prompts-dir ./prompts
python code/stance_detector.py \
--collection trec2020 or trec2021 \
--data-root ./data \
--output-dir ./results/ \
--llm GPT-4o or DeepSeek-R1-claude3.7 \
--model gpt-4o \
--api-key $OPENAI_API_KEY \
--prompts-dir ./prompts