8000 May metadata improvements by zoner72 · Pull Request #7 · zoner72/Datavizion-RAG · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

May metadata improvements #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 17, 2025
Merged

May metadata improvements #7

merged 6 commits into from
May 17, 2025

Conversation

zoner72
Copy link
Owner
@zoner72 zoner72 commented May 17, 2025

improved scraping
additional metadata
config tab field comments
bugfixes

zoner72 and others added 6 commits May 10, 2025 16:22
…rallel) and add_documents (chunks→vectors→Qdrant in batches)

feat: preserve source_filepath in each Point’s payload and record last_modified for refresh logic

feat: implement refresh_index to only re‑index files whose on‑disk mtime > indexed mtime (via Qdrant scroll)

feat: implement rebuild_index to clear collection and fully re‑chunk & re‑index all files under data_directory

fix: unify cancellation flag parameter (worker_flag) and remove extra worker_is_running_flag kwarg so GUI’s IndexWorker.run() matches signature

chore: introduce WorkerConfig dataclass for chunking settings (size, overlap, clean_html, lowercase, file_filters) and use asdict() on it

fix: update _get_worker_config() to return WorkerConfig instead of raw dict to satisfy asdict() precondition

feat: carry original_chunk_id through metadata for traceability

style: align naming and logging in QdrantIndexManager with existing code conventions

test: manually verify add/refresh/rebuild flows and that “stop” from GUI cleanly interrupts indexing
@zoner72 zoner72 merged commit e38f9c7 into main May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0