8000 GitHub - AnshumanAI/Steps
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

AnshumanAI/Steps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Textbook Content Extraction and RAG System

This project extracts content from textbooks, creates hierarchical indexes, implements hybrid retrieval techniques, and develops a Retrieval Augmented Generation (RAG) system for answering questions based on the retrieved content.

Requirements

  • Python 3.7+
  • transformers
  • rank_bm25
  • PyMuPDF
  • nltk

Usage

Step 1: Text Extraction

Extract content from PDF textbooks.

Step 2: Hierarchical Tree Indexing

Create hierarchical tree-based indexes from the extracted content.

Step 3: Retrieval Techniques

Implement and test BM25, DPR, and hybrid retrieval techniques.

Step 4: Multi-document RAG

Combine the retrieval techniques to handle multiple documents and sections.

Step 5: Question Answering

Integrate a language model to generate answers based on the retrieved content.

Running the Notebook

Copy and execute the Jupyter notebook to perform each step sequentially, Don't forget to add your Books.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
2E1D
0