RAGfus

A Retrieval-Augmented Generation (RAG) system for semantic document search and embedding management.

Features

Document processing and embedding generation using BERT
Support for multiple file formats (PDF, DOCX, TXT, Python files)
Semantic search with filtering by file type and similarity threshold
Web-based user interface for document management
Document preview functionality

Installation

Requirements

Python 3.8+
Flask
PyTorch
Transformers (Hugging Face)
SQLite3
python-docx
PyPDF2

Setup

Clone the repository:

git clone https://github.com/yourusername/RAGfus.git
cd RAGfus

Install requirements:

pip install -r requirements.txt

Run the application:

python app.py

The application will be available at http://localhost:5000

Usage

Web Interface

The web interface provides the following functionality:

Upload individual documents
Process entire directories
Search documents semantically
Preview document content
Manage documents (delete, view)

API Endpoints

POST /upload: Upload and process a document
POST /process_directory: Process all documents in a directory
POST /search: Find documents semantically similar to a query
GET /documents: List all documents in the database
GET /documents/{id}/preview: Preview document content
DELETE /documents/{id}: Delete a document

How It Works

RAGfus uses BERT embeddings to represent documents semantically. When a document is uploaded or processed, the system:

Extracts text content from the document
Generates embeddings using a BERT model
Stores the document and its embedding in a SQLite database

When searching, RAGfus:

Generates an embedding for the search query
Computes similarity between the query and all documents
Returns the most similar documents based on cosine similarity

License

MIT

Contributors

Agustín Conesa

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
templates		templates
uploads		uploads
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
clear_db.py		clear_db.py
database.py		database.py
embeder.py		embeder.py
rebuild_embeddings.py		rebuild_embeddings.py
requirements.txt		requirements.txt
retreiver.py		retreiver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAGfus

Features

Installation

Requirements

Setup

Usage

Web Interface

API Endpoints

How It Works

License

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aconesac/RAGfus

Folders and files

Latest commit

History

Repository files navigation

RAGfus

Features

Installation

Requirements

Setup

Usage

Web Interface

API Endpoints

How It Works

License

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages