AI Research Papers RAG System

A sophisticated Retrieval-Augmented Generation (RAG) system designed to analyze and query AI research papers using Google's Gemini models and ChromaDB.

Features

PDF Processing: Automatically processes AI research papers in PDF format
Local ChromaDB: Uses local ChromaDB for efficient vector storage and retrieval
Multiple Interfaces: CLI, Web API, and Streamlit web interface
Session Management: Persistent chat sessions with auto-save functionality
Streaming Responses: Real-time response streaming for better user experience

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Environment Configuration

Create a .env file with your Google API key:

GOOGLE_API_KEY=your_google_api_key_here
GEMINI_MODEL=gemini-2.0-flash-thinking-exp-01-21
GEMINI_TEMPERATURE=0.4

3. Add Research Papers

Place your AI research papers (PDF files) in the Data/ directory.

4. Initialize Database

Run the initialization script to process PDFs and create the vector database:

python init_database.py

Usage

Command Line Interface

# Interactive mode (recommended)
python main.py --mode interactive

# Direct query
python main.py --mode gemini --query "What are transformers in deep learning?"

# Raw document retrieval
python main.py --mode direct --query "neural networks" --documents 5

Web API

Start the FastAPI server:

python server.py

API endpoints:

POST /query/ - Submit queries
GET /health - Health check 748F
GET /sessions - List sessions
And more...

Streamlit Web Interface

Launch the web interface:

streamlit run st.py

Project Structure

RAG/
├── Data/                    # PDF research papers
├── Chroma_db/              # Local ChromaDB storage
├── SavedStates/            # Session backups
├── Src/
│   ├── configs.py          # Configuration settings
│   ├── vector_db.py        # ChromaDB operations
│   ├── gemini_query.py     # Gemini integration
│   ├── prompts.py          # System prompts
│   ├── interactive.py      # Interactive session handling
│   ├── state.py            # Session state management
│   └── Rag.py             # Document processing
├── main.py                 # CLI interface
├── server.py              # FastAPI server
├── st.py                  # Streamlit app
├── api_client.py          # API client
└── init_database.py       # Database initialization

Configuration

Key configuration options in Src/configs.py:

chunk_size: Document chunk size (default: 1000)
chunk_overlap: Chunk overlap (default: 200)
collection_name: ChromaDB collection name (default: "AI_Papers")
persist_directory: ChromaDB storage location (default: "./Chroma_db")

System Prompts

The system is optimized for AI research with specialized prompts that:

Focus on technical accuracy
Provide paper citations
Explain complex concepts clearly
Compare different approaches across papers

Session Management

Automatic session persistence
Configurable auto-save thresholds
Multiple storage backends (JSON, Vector DB)
Session restoration and backup

API Features

Health monitoring
Session management
Streaming responses
Background task processing
Automatic session cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Research Papers RAG System

Features

Setup

1. Install Dependencies

2. Environment Configuration

3. Add Research Papers

4. Initialize Database

Usage

Command Line Interface

Web API

Streamlit Web Interface

Project Structure

Configuration

System Prompts

Session Management

API Features

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Src		Src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_client.py		api_client.py
init_database.py		init_database.py
main.py		main.py
requirements.txt		requirements.txt
server.py		server.py
st.py		st.py

License

Raghav-56/RAG

Folders and files

Latest commit

History

Repository files navigation

AI Research Papers RAG System

Features

Setup

1. Install Dependencies

2. Environment Configuration

3. Add Research Papers

4. Initialize Database

Usage

Command Line Interface

Web API

Streamlit Web Interface

Project Structure

Configuration

System Prompts

Session Management

API Features

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages