Closed
Description
Describe the bug
When running nexmark sql5, risingwave ComputeNode will OOM after about 30 mins.
To Reproduce
the bug emerges in EKS environment, but should also happen in local environment.
use nexmark-bench to generate data through Kafka.
Expected behavior
No response
Additional context
Based on metrics, the OOM happens on both of the two hashagg fragment(max and count).
Here's the nexmark q5
CREATE MATERIALIZED VIEW nexmark_q5
AS
SELECT AuctionBids.auction,
AuctionBids.num
FROM (SELECT bid.auction,
count(*) AS num,
window_start AS starttime
FROM
HOP(bid, date_time, INTERVAL '2' SECOND, INTERVAL '10' SECOND)
GROUP BY window_start,
bid.auction) AS AuctionBids
JOIN (SELECT max(CountBids.num) AS maxn,
CountBids.starttime_c
FROM (SELECT count(*) AS num,
window_start AS starttime_c
FROM HOP(bid, date_time, INTERVAL '2' SECOND, INTERVAL '10' SECOND)
GROUP BY bid.auction,
window_start) AS CountBids
GROUP BY CountBids.starttime_c) AS MaxBids
ON
AuctionBids.starttime = MaxBids.starttime_c AND
AuctionBids.num >= MaxBids.maxn;