8000 GitHub - Vetrivel07/NewsBot: NewsBot is an AI-powered Q&A tool that answers questions from uploaded PDFs or web article URLs using LangChain, OpenAI, and Streamlit.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

NewsBot is an AI-powered Q&A tool that answers questions from uploaded PDFs or web article URLs using LangChain, OpenAI, and Streamlit.

Notifications You must be signed in to change notification settings

Vetrivel07/NewsBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsBot: AI-Powered News Q&A

NewsBot is an AI-driven content analysis tool that enables users to extract insights from news articles and documents. Whether you paste a URL or upload a PDF, NewsBot intelligently processes the content, understands context, and answers your questions using powerful language models. Built with Streamlit, LangChain, and OpenAI, it combines real-time document parsing, semantic search with FAISS, and natural language understanding to deliver fast, accurate, and source-backed responses.

✨ Features

📎 Accepts URLs or PDF uploads as content sources
🔍 Splits and embeds text using OpenAI Embeddings
⚡ Fast and accurate Q&A using FAISS vector search
📚 Displays answers along with the original source
🧠 Powered by LangChain and OpenAI GPT models

NewsBot Architecture Overview

                        ┌─────────────────────────────┐
                        │        Streamlit UI         │
                        │ - Sidebar: URL/PDF inputs   │
                        │ - Main: Question + Answer   │
                        └────────────┬────────────────┘
                                     │
                            ▼ Process Trigger
                      (when user clicks "Load Source")
                                     │
                                     ▼
        ┌─────────────────────────────────────────────────────┐
        │          Content Loading & Preprocessing            │
        │─────────────────────────────────────────────────────│
        │ If URL(s):                                          │
        │   → UnstructuredURLLoader fetches & parses text     │
        │                                                     │
        │ If PDF:                                             │
        │   → PyPDF2 reads pages                              │
        │   → LangChain `Document` created from text          │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │       Recursive Text Chunking (LangChain)           │
        │ - Uses RecursiveCharacterTextSplitter               │
        │ - Breaks content into 1000-token overlapping chunks │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │         OpenAI Embeddings + FAISS Indexing          │
        │ - Embeds chunks using OpenAIEmbeddings              │
        │ - Stores vectors in FAISS index                     │
        │ - Saves (index, docstore, id_map) in pickle file    │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │          Question-Answer Inference Pipeline         │
        │ - User asks question                                │
        │ - FAISS index loaded                                │
        │ - Vector similarity search retrieves top docs       │
        │ - LangChain `RetrievalQAWithSourcesChain` runs LLM  │
        │ - GPT generates final answer using retrieved docs   │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
                        ┌───────────────┐
                        │  UI Output    │
                        │ - Answer      │
                        │ - Source(s)   │
                        └───────────────┘

📸 Screenshot

Index

🔮 Future Enhancements

  1. 🖼️ Image-Based Text Extraction
  • Integrate Optical Character Recognition (OCR) to extract and analyze text from images embedded within documents or uploaded directly.
  1. 📝 Handwritten & Scanned Document Support
  • Extend compatibility to scanned PDFs and handwritten content using OCR tools, enabling processing of a broader range of document types.
  1. 📚 Multi-Document Cross Analysis
  • Allow users to submit multiple documents or articles simultaneously for comparison, aggregation, and context-aware question answering.
  1. 🧠 Conversational Memory (Chain Memory)
  • Introduce memory components that preserve the context of previous interactions, enabling multi-turn dialogue and a more natural Q&A experience.

Author

👤 Vetrivel Maheswaran

Connect With Me 🌐

LinkedIn

PortFolio

© Created by Vetrivel Maheswaran

About

NewsBot is an AI-powered Q&A tool that answers questions from uploaded PDFs or web article URLs using LangChain, OpenAI, and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0