8000 iceberg: Reduce Storage Footprint for Append-only Use Cases · Issue #21586 · risingwavelabs/risingwave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

iceberg: Reduce Storage Footprint for Append-only Use Cases #21586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #19418
xxchan opened this issue Apr 27, 2025 · 2 comments
Closed
Tracked by #19418

iceberg: Reduce Storage Footprint for Append-only Use Cases #21586

xxchan opened this issue Apr 27, 2025 · 2 comments
Assignees
Milestone

Comments

@xxchan
Copy link
Member
xxchan commented Apr 27, 2025

with the help of iceberg streaming source #20955

current architecture

flowchart TB
  subgraph Iceberg
    IT[Iceberg Table]
  end

  subgraph "Iceberg Table Engine"
    direction TB
    subgraph HT[Hummock Table]
		direction LR
        SM[Materialize]
        SR[RowIdGen]
        SU[Union]

        SS1["Source(connector)"]

        SD[Source+Dml]

        SS1 -->|exchange| SU

        SD -->|exchange| SU
        SU --> SR --> SM
    end

    ISnk[Iceberg Sink]
    
    ISrc[Iceberg Source]

    HT -->|events| ISnk
    ISnk -->|periodic commit| IT
    ISrc -.->|scan| IT
  end

  subgraph Producer
    Con[Connector]
    Dml[INSERT]
  end


  
  Con --> SS1
  Dml --> SD

  HT -->|events from dispatcher| MV[Downstream MV]
  Query[Ad-hoc Query] -.->|scan| ISrc

  classDef component fill:#f9f,stroke:#333,stroke-width:2px;
  classDef external  fill:#bbf,stroke:#333,stroke-width:1px;
  classDef highlight fill:#5CE1E6,stroke:#333,stroke-width:1px;
  classDef muted     fill:#D9D9D9,stroke:#333,stroke-width:1px;

  class HT,ISnk,IT,ISrc component;
  class MV,Query external;
  class IT highlight;
  class ISrc muted;
Loading

With streaming iceberg source

flowchart TB
  subgraph Iceberg
    IT[Iceberg Table]
  end

  subgraph "Iceberg Table Engine"
    direction TB
    subgraph HT["Hummock Table(ingestion only)"]
		direction LR
        SR[RowIdGen]
        SU[Union]

        SS1["Source(connector)"]

        SD[Source+Dml]

        SS1 -->|exchange| SU

        SD -->|exchange| SU
        SU --> SR
    end

    ISnk[Iceberg Sink]
    
    ISrc[Iceberg Source]

    HT -->|events| ISnk
    ISnk -->|periodic commit| IT
    ISrc -.->|scan| IT
  end

  subgraph Producer
    Con[Connector]
    Dml[INSERT]
  end


  
  Con --> SS1
  Dml --> SD

  subgraph MV["Downstream MV"]
     subgraph StreamingSource["Iceberg streaming source"]
         IcebergList --> IcebergFetch
     end
     StreamingSource --> Nodes
  end
  ISrc -.-> |instantiated as| StreamingSource
  StreamingSource -.->|snapshot+incremental scan| IT

  
  Query[Ad-hoc Query] -.->|scan| ISrc

  classDef component fill:#f9f,stroke:#333,stroke-width:2px;
  classDef external  fill:#bbf,stroke:#333,stroke-width:1px;
  classDef highlight fill:#5CE1E6,stroke:#333,stroke-width:1px;
  classDef muted     fill:#D9D9D9,stroke:#333,stroke-width:1px;

  class HT,ISnk,IT,ISrc component;
  class MV,Query external;
  class IT highlight;
  class ISrc muted;
Loading

Alternative solution: Backfill + forward

flowchart TB
  subgraph Iceberg
    IT[Iceberg Table]
  end

  subgraph "Iceberg Table Engine"
    direction TB
    subgraph HT["Hummock Table(ingestion+dispatch)"]
		direction LR
        SR[RowIdGen]
        SU[Union]

        SS1["Source(connector)"]

        SD[Source+Dml]

        SS1 -->|exchange| SU

        SD -->|exchange| SU
        SU --> SR
    end

    ISnk["Iceberg Sink (w/ logstore)"]
    
    ISrc[Iceberg Source]

    HT -->|events| ISnk
    ISnk -->|periodic commit| IT
    ISrc -.->|scan| IT

    logstore
  end

  subgraph Producer
    Con[Connector]
    Dml[INSERT]
  end


  
  Con --> SS1
  Dml --> SD

  subgraph MV["Downstream MV"]
     IcebergBackfill["IcebergBackfill(may need to read logstore in backfill stage)"]
     HT -->|events| logstore --> IcebergBackfill  --> Nodes
  end
  
  IcebergBackfill -.->|snapshot scan| IT

  
  Query[Ad-hoc Query] -.->|scan| ISrc

  classDef component fill:#f9f,stroke:#333,stroke-width:2px;
  classDef external  fill:#bbf,stroke:#333,stroke-width:1px;
  classDef highlight fill:#5CE1E6,stroke:#333,stroke-width:1px;
  classDef muted     fill:#D9D9D9,stroke:#333,stroke-width:1px;

  class HT,ISnk,IT,ISrc component;
  class MV,Query external;
  class IT highlight;
  class ISrc muted;
Loading
@github-actions github-actions bot added this to the release-2.4 milestone Apr 27, 2025

This comment has been minimized.

@BugenZhao BugenZhao changed the title Reduce Storage Footprint for Append-only Use Cases iceberg: Reduce Storage Footprint for Append-only Use Cases Apr 29, 2025
@BugenZhao BugenZhao modified the milestones: release-2.4, release-2.5 May 8, 2025
@xxchan xxchan self-assigned this May 9, 2025
@xxchan
Copy link
Member Author
xxchan commented Jun 3, 2025

#21811

@xxchan xxchan closed this as completed Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Assignees

@xxchan xxchan

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0