Textbook Content Extraction and RAG System

This project extracts content from textbooks, creates hierarchical indexes, implements hybrid retrieval techniques, and develops a Retrieval Augmented Generation (RAG) system for answering questions based on the retrieved content.

Requirements

Python 3.7+
transformers
rank_bm25
PyMuPDF
nltk

Usage

Step 1: Text Extraction

Extract content from PDF textbooks.

Step 2: Hierarchical Tree Indexing

Create hierarchical tree-based indexes from the extracted content.

Step 3: Retrieval Techniques

Implement and test BM25, DPR, and hybrid retrieval techniques.

Step 4: Multi-document RAG

Combine the retrieval techniques to handle multiple documents and sections.

Step 5: Question Answering

Integrate a language model to generate answers based on the retrieved content.

Running the Notebook

Copy and execute the Jupyter notebook to perform each step sequentially, Don't forget to add your Books.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Steps.ipynb		Steps.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Textbook Content Extraction and RAG System

Requirements

Usage

Step 1: Text Extraction

Step 2: Hierarchical Tree Indexing

Step 3: Retrieval Techniques

Step 4: Multi-document RAG

Step 5: Question Answering

Running the Notebook

About

Uh oh!

Releases

Packages

Languages

AnshumanAI/Steps

Folders and files

Latest commit

History

Repository files navigation

Textbook Content Extraction and RAG System

Requirements

Usage

Step 1: Text Extraction

Step 2: Hierarchical Tree Indexing

Step 3: Retrieval Techniques

Step 4: Multi-document RAG

Step 5: Question Answering

Running the Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages