this readme was created by chatgpt
This project processes and stores documents in a Chroma vector database for semantic search and retrieval tasks using LangChain and OpenAI embeddings. It integrates local files and web content, splitting text into manageable chunks for efficient vector search operations.
- Load and Process Documents: Supports local
.md
files and web URLs as document sources. - Text Splitting: Uses recursive splitting for efficient chunking and indexing.
- Vector Embedding Generation: Converts text chunks into high-dimensional vectors using OpenAI embeddings.
- Chroma Database: Stores vector embeddings for persistence and fast similarity search.
- Python 3.8+
- Dependencies listed in
requirements.txt
- Clone the Repository:
git clone https://github.com/yourusername/document-chroma-vectorstore.git cd document-chroma-vectorstore