8000 feat(ingestion): support delete ingested doc and garbage collection by zwpaper · Pull Request #4203 · TabbyML/tabby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(ingestion): support delete ingested doc and garbage collection #4203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2025

Conversation

zwpaper
Copy link
Member
@zwpaper zwpaper commented Apr 29, 2025

API

CleanShot 2025-04-29 at 21 42 20@2x

GC

  1. for deleting by source, GC by source id would do the cleanup.
  2. for deleting by id, list every structured doc source and id, and then range to check whether it is ingested doc and still existed in sqlite.

As for 2., Tantivy can not query ingested doc with streaming return, so we are iterating all structured docs

Test

ingested docs:

Postman 2025-04-29 21 36 19

ingested index:

CleanShot 2025-04-29 at 21 38 51@2x

deleted doc:
CleanShot 2025-04-29 at 21 40 38@2x

after GC:

ingested doc with id page 123 456 678 has gone in index:

CleanShot 2025-04-29 at 21 41 07@2x

@zwpaper zwpaper force-pushed the feat/ingestion-gc branch from f3552a2 to bd6be36 Compare April 29, 2025 13:49
@zwpaper zwpaper requested review from Copilot and wsxiaoys April 29, 2025 13:58
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for deleting ingested documents and extends garbage collection to include deletion by source, while also updating the ingestion service and related endpoints.

  • Implements two new endpoints for deleting ingested documents (by document id and by source).
  • Updates background job runners and garbage collection logic to incorporate ingestion deletion.
  • Updates database queries and indexer functionality to support the new deletion features.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ee/tabby-webserver/src/service/ingestion.rs Adds new ingestion service methods for getting and deleting ingested documents.
ee/tabby-webserver/src/service/background_job/mod.rs & hourly.rs Updates background job invocations to pass the ingestion service.
ee/tabby-webserver/src/routes/mod.rs & ingestion.rs Introduces new HTTP DELETE routes for ingestion deletions.
ee/tabby-schema/src/schema/ingestion.rs Updates the IngestionService trait with deletion methods.
ee/tabby-db/src/ingestion.rs Adds SQL queries for deleting ingested documents.
crates/tabby-index/* Adjusts indexer and structured doc garbage collector implementations to accommodate deletion by ingestion status.
Comments suppressed due to low confidence (2)

ee/tabby-webserver/src/service/ingestion.rs:114

  • [nitpick] Consider renaming the parameters 'source' and 'id' to 'source_name' and 'doc_id' (or similar) for clarity, as 'source' is transformed to a source_id via source_id_from_name later in the function.
async fn delete(&self, source: String, id: String) -> Result<()> {

crates/tabby-index/src/indexer.rs:334

  • [nitpick] Since iter_ids now returns a tuple (source, id) instead of a single id, consider updating the function name or adding a comment to clearly communicate the new return type to avoid potential confusion in downstream consumers.
pub fn iter_ids(&self) -> impl Stream<Item = (String, String)> + '_ {

@zwpaper zwpaper merged commit e208810 into main May 6, 2025
4 of 6 checks passed
@zwpaper zwpaper deleted the feat/ingestion-gc branch May 6, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0