-
Notifications
You must be signed in to change notification settings - Fork 1.7k
avoid heatmaps getting so hot they explode #23620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CREATE TABLE IF NOT EXISTS {table_name} ON CLUSTER '{cluster}' | ||
( | ||
kafka_payload String | ||
) ENGINE = {engine} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, with JSONAsString
if we send [{a: 1}, {a: 2}, {a: 3}]
the kafka table will have three rows
"{\"a\": 1}"
"{\"a\": 2}"
"{\"a\": 3}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a super clean solution. Not the authority on whether it works but feels like a great incremental step forward to alleviate some issues
I love what you are doing here in terms of reducing the load on Kakfa, but from a consumer side I don't think there is very much of an increase in load and from a producer side (emitting all these messages) there is definitely an increase, but I think in general we can tune the cost using I'd prefer to tune that then add complexity to CH, but I'm definitely not against it! Just curious about alternate solutions that might treat this issue across other producers. |
yep... i'm always impressed by just how much data ClickHouse can ingest. That's either never a problem or so rarely a problem i'd try to ignore it for a long time
just to check (cos I didn't follow 🧠) this approach would reduce load on the producer since we'd emit 1 message instead of n messages did you mean that kafka should be able to handle current load x >10 with tuning or.....? definitely don't want to add a new ingestion mechanism unnecessarily! |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
i don't have kreplets to push this forward 🙈 |
we ingest heatmap data as an array property on an event
that array can have tens of items in it
when the event hits the plugin server
we take the array of heatmap data from the single event
do some validation and transformation
then write each item from the array onto a topic to be written to ClickHouse
so if you have a big burst of traffic and we take millions of views, we'll write tens if not hundreds of millions of events to kafka
this is avoidable load on kafka
we know that heatmap data when serialized is less than 1MB or it would not get to plugin server in the first place
i started out aiming to write an array of items in an event and use
arrayJoin
in the materialized view reading from the kafkaTablebut...
if we use
JSONAsString
instead ofJSONEachRow
we can send an array of json items
clickhouse writes each to the kafka table in a single string column
and then we can use
JSONExtract
to read from those strings in the materialized viewdumped as a draft to get feedback