AI-Powered Document Q&A

This is a Streamlit application that allows users to upload a PDF document and ask questions about its content using AI-powered natural language processing (NLP) tools. The app uses Langchain, OpenAI's GPT-4 model, and FAISS (Facebook AI Similarity Search) for document retrieval and question answering.

Requirements

The following libraries are required for this project:

faiss-cpu >= 1.10.0
langchain-community >= 0.3.18
langchain-huggingface >= 0.1.2
langchain-openai >= 0.3.7
langchain >= 0.3.19
openai >= 1.64.0
pypdf >= 5.3.0
sentence-transformers >= 3.4.1
streamlit >= 1.42.2
tiktoken >= 0.9.0
scripts.secret (for storing your OpenAI API key)
scripts.document_loader (for loading and splitting PDF documents)

You can install the necessary dependencies using pip:

pip install streamlit langchain faiss openai

Overview

The application runs as follows:

User can upload a PDF document.
Application processes the document by splitting it into chunks.
Creates a vector database for efficient document retrieval using OpenAI embeddings.
Use the GPT-4 model to answer questions related to the document.

Demonstration

How it Works

Document Upload The user can upload a PDF document via the Streamlit interface. The file is saved temporarily on the server, and the document is then processed into text chunks for easier retrieval.
Text Splitting and Vector Database Creation Once the document is uploaded:

The document is loaded and split into smaller chunks to optimize retrieval performance.
Embeddings (numerical representations of the document's contents) are generated using the OpenAIEmbeddings model.
A FAISS vector store is created from these embeddings, allowing efficient similarity search.

Question-Answering System The system utilizes a retrieval-based approach (RAG - Retrieval-Augmented Generation) to answer user queries. When a user asks a question, the system retrieves the most relevant chunks from the document using FAISS and processes them with GPT-4 to generate a meaningful answer.
User Interface The application presents:

An input field to upload the PDF file.
A text box to ask questions about the document.
Responses generated by the AI based on the content of the document.

Code Walkthrough

Imports

The required libraries are imported to handle document processing, vector database creation, and AI-powered responses:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from scripts.secret import OPENAI_KEY
from scripts.document_loader import load_document
import streamlit as st

Streamlit App Setup

The Streamlit app is initialized, with the title displayed and a file uploader for PDFs:

st.title("AI-Powered Document Q&A")
uploaded_file = st.file_uploader("Upload a PDF file", type="pdf")

Document Processing

When a file is uploaded, the following steps are performed:

The document is saved temporarily.
The document is split into chunks using the load_document function.

temp_file = "./temp.pdf"
with open(temp_file, "wb") as file:
    file.write(uploaded_file.getvalue())
chunks = load_document(temp_file)

Embeddings and Vector Store

Embeddings are generated using OpenAI's model (text-embedding-ada-002), and a FAISS vector store is created to hold these embeddings:

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_KEY, model="text-embedding-ada-002")
vector_db = FAISS.from_documents(chunks, embeddings)

Question Answering Chain

The question-answering chain is created by combining the retriever (FAISS) and the document processing chain (StuffDocumentsChain). The model used for answering questions is GPT-4:

retriever = vector_db.as_retriever()
llm = ChatOpenAI(model_name="gpt-4o-mini", openai_api_key=OPENAI_KEY)
system_prompt = "You are a helpful assistant. Use the given context to answer the question."
prompt = ChatPromptTemplate.from_messages([("system", system_prompt), ("human", "{input}")])
question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)

User Interaction

The user is prompted to input a question, and the system responds with an answer based on the document's content:

question = st.text_input("Ask a question about the document:")
if question:
    response = chain.invoke({"input": question})['answer']
    st.write(response)

Running the App

To run the app, execute the following command in your terminal:

streamlit run app.py

This will start the Streamlit server and open the app in your browser.

License

This project is licensed under the MIT License.

Author

Gustavo R. Santos

Linkedin
Website
Medium Blog

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
img		img
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
temp.pdf		temp.pdf
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Powered Document Q&A

Requirements

Overview

Demonstration

How it Works

Code Walkthrough

Imports

Streamlit App Setup

Document Processing

Embeddings and Vector Store

Question Answering Chain

User Interaction

Running the App

License

Author

Gustavo R. Santos

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gurezende/Basic-Rag

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Document Q&A

Requirements

Overview

Demonstration

How it Works

Code Walkthrough

Imports

Streamlit App Setup

Document Processing

Embeddings and Vector Store

Question Answering Chain

User Interaction

Running the App

License

Author

Gustavo R. Santos

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages