This project extracts content from textbooks, creates hierarchical indexes, implements hybrid retrieval techniques, and develops a Retrieval Augmented Generation (RAG) system for answering questions based on the retrieved content.
- Python 3.7+
- transformers
- rank_bm25
- PyMuPDF
- nltk
Extract content from PDF textbooks.
Create hierarchical tree-based indexes from the extracted content.
Implement and test BM25, DPR, and hybrid retrieval techniques.
Combine the retrieval techniques to handle multiple documents and sections.
Integrate a language model to generate answers based on the retrieved content.
Copy and execute the Jupyter notebook to perform each step sequentially, Don't forget to add your Books.