NewsBot: AI-Powered News Q&A

NewsBot is an AI-driven content analysis tool that enables users to extract insights from news articles and documents. Whether you paste a URL or upload a PDF, NewsBot intelligently processes the content, understands context, and answers your questions using powerful language models. Built with Streamlit, LangChain, and OpenAI, it combines real-time document parsing, semantic search with FAISS, and natural language understanding to deliver fast, accurate, and source-backed responses.

✨ Features

📎 Accepts URLs or PDF uploads as content sources
🔍 Splits and embeds text using OpenAI Embeddings
⚡ Fast and accurate Q&A using FAISS vector search
📚 Displays answers along with the original source
🧠 Powered by LangChain and OpenAI GPT models

NewsBot Architecture Overview

                        ┌─────────────────────────────┐
                        │        Streamlit UI         │
                        │ - Sidebar: URL/PDF inputs   │
                        │ - Main: Question + Answer   │
                        └────────────┬────────────────┘
                                     │
                            ▼ Process Trigger
                      (when user clicks "Load Source")
                                     │
                                     ▼
        ┌─────────────────────────────────────────────────────┐
        │          Content Loading & Preprocessing            │
        │─────────────────────────────────────────────────────│
        │ If URL(s):                                          │
        │   → UnstructuredURLLoader fetches & parses text     │
        │                                                     │
        │ If PDF:                                             │
        │   → PyPDF2 reads pages                              │
        │   → LangChain `Document` created from text          │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │       Recursive Text Chunking (LangChain)           │
        │ - Uses RecursiveCharacterTextSplitter               │
        │ - Breaks content into 1000-token overlapping chunks │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │         OpenAI Embeddings + FAISS Indexing          │
        │ - Embeds chunks using OpenAIEmbeddings              │
        │ - Stores vectors in FAISS index                     │
        │ - Saves (index, docstore, id_map) in pickle file    │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
        ┌─────────────────────────────────────────────────────┐
        │          Question-Answer Inference Pipeline         │
        │ - User asks question                                │
        │ - FAISS index loaded                                │
        │ - Vector similarity search retrieves top docs       │
        │ - LangChain `RetrievalQAWithSourcesChain` runs LLM  │
        │ - GPT generates final answer using retrieved docs   │
        └────────────────────┬────────────────────────────────┘
                             │
                             ▼
                        ┌───────────────┐
                        │  UI Output    │
                        │ - Answer      │
                        │ - Source(s)   │
                        └───────────────┘

📸 Screenshot

🔮 Future Enhancements

🖼️ Image-Based Text Extraction

Integrate Optical Character Recognition (OCR) to extract and analyze text from images embedded within documents or uploaded directly.

📝 Handwritten & Scanned Document Support

Extend compatibility to scanned PDFs and handwritten content using OCR tools, enabling processing of a broader range of document types.

📚 Multi-Document Cross Analysis

Allow users to submit multiple documents or articles simultaneously for comparison, aggregation, and context-aware question answering.

🧠 Conversational Memory (Chain Memory)

Introduce memory components that preserve the context of previous interactions, enabling multi-turn dialogue and a more natural Q&A experience.

Author

👤 Vetrivel Maheswaran

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
static		static
.env		.env
README.md		README.md
app.py		app.py
faiss_store_openai.pkl		faiss_store_openai.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NewsBot: AI-Powered News Q&A

✨ Features

NewsBot Architecture Overview

📸 Screenshot

🔮 Future Enhancements

Author

Connect With Me 🌐

About

Uh oh!

Releases

Packages

Languages

Vetrivel07/NewsBot

Folders and files

Latest commit

History

Repository files navigation

NewsBot: AI-Powered News Q&A

✨ Features

NewsBot Architecture Overview

📸 Screenshot

🔮 Future Enhancements

Author

Connect With Me 🌐

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages