A document processing and RAG (Retrieval-Augmented Generation) library that transforms unstructured text from documents into rich knowledge graphs and performs entity and relation extraction based on the user's input analysis scope. By using a graph database for enhanced context retrieval, it enables natural conversation with your documents through a chat interface.
docuGraphRAG.js is the successor of docuRAG.js, representing a significant architectural shift in how we handle document context and relationships:
- Hybrid search combining vector, text, and graph-based approaches
- Entity extraction and relationship modeling based on the user's scope.
- Efficient storage and retrieval using Neo4j
- Streaming responses for real-time chat interactions
- Configurable search strategies and weights
- Splits documents into manageable chunks
- Generates vector embeddings for each chunk
- Stores content in Neo4j for efficient retrieval
- Multiple search strategies working together:
- Vector similarity search (40% weight)
- Full-text search (30% weight)
- Graph-based search (30% weight)
- Intelligent result merging and ranking
- Configurable search options
- Efficient storage and retrieval
- Stores documents as connected chunks
- Uses Neo4j for efficient storage
- Basic document-chunk relationships
- Entity relationship tracking
- Graph traversal capabilities
- Natural language interaction with documents
- Context-aware responses
- Streaming response generation
- Multiple search strategies:
- Vector similarity search
- Full-text search
- Graph-based search
- Configurable search options
- Node.js 18+
- Docker / Neo4j Database
- OpenAI API Key (gpt-4 and text-embedding-3-small)
- Clone the repository:
git clone https://github.com/msroot/docuGraphRAG.js.git
cd docuGraphRAG.js
- Install dependencies:
npm install
- Start Neo4j (using Docker):
docker-compose up -d neo4j
Access the Neo4j Browser at http://localhost:7474/browser/ to explore your graph database.
Run the example app
- Create environment file:
cd app/
# Create a new .env file
touch .env
# Add the following configuration to your .env file:
NEO4J_URL=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
OPENAI_API_KEY=your-openai-key
- Run the example app:
npm install
npm start
The example app provides a complete web interface featuring:
- Document upload and management
- Interactive chat interface with streaming responses
- Real-time Knowledge Graph visualization
- Visual representation of document relationships
- Entity connections and relationships
- Expandable full-screen view
- Interactive node exploration
- Configurable search strategies:
- Vector similarity search
- Full-text search
- Graph-based search
- Direct access to Neo4j Browser for advanced queries
Visit http://localhost:3000 to explore the interface.
Parameter | Type | Default | Description |
---|---|---|---|
Required Settings | |||
neo4jUrl |
string | - | Neo4j database connection URL |
neo4jUser |
string | - | Neo4j database username |
neo4jPassword |
string | - | Neo4j database password |
openaiApiKey |
string | - | Your OpenAI API key |
Optional Settings | |||
chunkSize |
number | 1000 | Size of document chunks in characters |
chunkOverlap |
number | 200 | Overlap between consecutive chunks |
similarityThreshold |
number | 0.1 | Minimum similarity score for vector search |
vectorSearchWeight |
number | 0.4 | Weight for vector similarity search (0-1) |
textSearchWeight |
number | 0.3 | Weight for full-text search (0-1) |
graphSearchWeight |
number | 0.3 | Weight for graph-based search (0-1) |
Example configuration in code:
import { DocuGraphRAG } from 'docugraphrag';
const config = {
neo4jUrl: process.env.NEO4J_URL,
neo4jUser: process.env.NEO4J_USER,
neo4jPassword: process.env.NEO4J_PASSWORD,
openaiApiKey: process.env.OPENAI_API_KEY,
// Optional settings
chunkSize: 1000,
chunkOverlap: 200,
similarityThreshold: 0.1,
vectorSearchWeight: 0.4,
textSearchWeight: 0.3,
graphSearchWeight: 0.3
};
const rag = new DocuGraphRAG(config);
import { DocuGraphRAG } from 'docugraphrag';
// Initialize
const rag = new DocuGraphRAG(config);
await rag.initialize();
// Process a document
const result = await rag.processDocument(text, "Analysis focus description");
// Chat with the document
const answer = await rag.chat("Who is Dr. Sarah Jones?", {
documentIds: ["doc123"],
vectorSearch: true,
textSearch: true,
graphSearch: true
});
graph TD
subgraph Document_Processing ["π Document Processing"]
DocInput[/"PDF/Text Document"/] --> TextProc["Text Extraction & Cleaning"]
TextProc --> Chunking["Smart Chunking"]
Chunking -->|"Overlapping chunks"| Chunks[(Processed Chunks)]
end
subgraph Knowledge_Graph_Creation ["π§ Knowledge Graph Creation"]
Chunks --> |"Chunk text"| VectorGen["Vector Embedding Generation"]
Chunks --> |"Content analysis"| EntityExt["Entity Extraction"]
EntityExt --> |"Named entities"| RelEngine["Relationship Engine"]
RelEngine --> |"Entity pairs"| RelCreation["Relationship Creation"]
subgraph Neo4j_Storage ["π Neo4j Database"]
GraphDB[("Neo4j Graph DB")]
Indexes["Custom Indexes"]
TextIndex["Full-Text Search Index"]
VectorIndex["Vector Index"]
GraphDB --> |"Indexed by"| Indexes
GraphDB --> |"Text indexing"| TextIndex
GraphDB --> |"Vector indexing"| VectorIndex
end
VectorGen --> |"Store embeddings"| GraphDB
RelCreation --> |"Store relationships"| GraphDB
Chunks --> |"Store chunks"| GraphDB
end
subgraph Query_Processing ["π Query Processing"]
UserQuery[/"User Question"/] --> QueryEmbed["Query Embedding"]
UserQuery --> QueryEntity["Query Entity Recognition"]
UserQuery --> TextSearch["Text Search Processing"]
subgraph Search_System ["Hybrid Search System"]
QueryEmbed --> |"Vector similarity (40%)"| HybridSearch
QueryEntity --> |"Entity matching (30%)"| HybridSearch
TextSearch --> |"Text matching (30%)"| HybridSearch
TextIndex --> |"Full-text results"| TextSearch
VectorIndex --> |"Vector results"| HybridSearch
GraphDB --> |"Graph traversal"| HybridSearch
HybridSearch --> |"Ranked & merged results"| ContextFusion
end
ContextFusion --> |"Combined context"| RespGen["Response Generation"]
RespGen --> Answer[/"Final Answer"/]
end
%% Styling
classDef input fill:#e3f2fd,stroke:#1565c0,stroke-width:2px;
classDef process fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
classDef database fill:#fff3e0,stroke:#ef6c00,stroke-width:2px;
classDef search fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px;
classDef output fill:#fce4ec,stroke:#c2185b,stroke-width:2px;
class DocInput,UserQuery input;
class TextProc,Chunking,VectorGen,EntityExt,RelEngine,RelCreation,QueryEmbed,QueryEntity,TextSearch process;
class GraphDB,Indexes,TextIndex,VectorIndex database;
class HybridSearch,ContextFusion search;
class Answer output;
%% Subgraph styling
style Document_Processing fill:#f8f9fa,stroke:#343a40,stroke-width:2px;
style Knowledge_Graph_Creation fill:#f8f9fa,stroke:#343a40,stroke-width:2px;
style Query_Processing fill:#f8f9fa,stroke:#343a40,stroke-width:2px;
style Neo4j_Storage fill:#fff3e0,stroke:#ef6c00,stroke-width:2px;
style Search_System fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px;
%% Relationships
linkStyle default stroke:#666,stroke-width:2px;
- Splits documents into manageable chunks
- Generates vector embeddings for each chunk
- Stores content in Neo4j database
When you ask a question:
- Converts question to vector embedding
- Finds relevant document chunks using multiple strategies:
- Vector similarity search
- Full-text search
- Graph-based search
- Merges and ranks results
- Generates comprehensive answer
(Document)-[:HAS_CHUNK]->(DocumentChunk)
(DocumentChunk)-[:HAS_ENTITY]->(Entity)
(Entity)-[:RELATES_TO]->(Entity)
We welcome contributions! Please check our contributing guidelines for more information.
RESEARCH PURPOSES ONLY. This project, docuGraphRAG.js
, is strictly intended for research and educational exploration. It has not been designed or tested for production environments and may contain limitations, errors, or security vulnerabilities.
MIT License
- Create an issue for bug reports
- Start a discussion for feature requests
- Check our documentation for guides
Built with β€οΈ by Yannis Kolovos