8000 avoid heatmaps getting so hot they explode by pauldambra · Pull Request #23620 · PostHog/posthog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

avoid heatmaps getting so hot they explode #23620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pauldambra
Copy link
Member

we ingest heatmap data as an array property on an event
that array can have tens of items in it

when the event hits the plugin server
we take the array of heatmap data from the single event
do some validation and transformation
then write each item from the array onto a topic to be written to ClickHouse

so if you have a big burst of traffic and we take millions of views, we'll write tens if not hundreds of millions of events to kafka

this is avoidable load on kafka

we know that heatmap data when serialized is less than 1MB or it would not get to plugin server in the first place


i started out aiming to write an array of items in an event and use arrayJoin in the materialized view reading from the kafkaTable

but...

if we use JSONAsString instead of JSONEachRow

we can send an array of json items
clickhouse writes each to the kafka table in a single string column
and then we can use JSONExtract to read from those strings in the materialized view


dumped as a draft to get feedback

@pauldambra pauldambra requested a review from a team as a code owner July 10, 2024 21:44
Comment on lines +12 to +16
CREATE TABLE IF NOT EXISTS {table_name} ON CLUSTER '{cluster}'
(
kafka_payload String
) ENGINE = {engine}
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, with JSONAsString if we send [{a: 1}, {a: 2}, {a: 3}]

the kafka table will have three rows

"{\"a\": 1}"
"{\"a\": 2}"
"{\"a\": 3}"

Copy link
Contributor
@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a super clean solution. Not the authority on whether it works but feels like a great incremental step forward to alleviate some issues

@fuziontech
Copy link
Member
fuziontech commented Jul 12, 2024

I love what you are doing here in terms of reducing the load on Kakfa, but from a consumer side I don't think there is very much of an increase in load and from a producer side (emitting all these messages) there is definitely an increase, but I think in general we can tune the cost using linger.ms:
https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html#linger-ms

I'd prefer to tune that then add complexity to CH, but I'm definitely not against it! Just curious about alternate solutions that might treat this issue across other producers.

@pauldambra
Copy link
Member Author

from a consumer side I don't think there is very much of an increase in load

yep... i'm always impressed by just how much data ClickHouse can ingest. That's either never a problem or so rarely a problem i'd try to ignore it for a long time

from a producer side (emitting all these messages) there is definitely an increase,

just to check (cos I didn't follow 🧠) this approach would reduce load on the producer since we'd emit 1 message instead of n messages

did you mean that kafka should be able to handle current load x >10 with tuning or.....?


definitely don't want to add a new ingestion mechanism unnecessarily!

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

@pauldambra pauldambra added waiting Prevents stale-bot from marking the PR as stale. and removed stale labels Jul 30, 2024
@marandaneto marandaneto marked this pull request as draft November 22, 2024 09:27
@marandaneto marandaneto changed the title draft: avoid heatmaps getting so hot they explode avoid heatmaps getting so hot they explode Nov 22, 2024
@pauldambra
Copy link
Member Author

i don't have kreplets to push this forward 🙈

@pauldambra pauldambra closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting Prevents stale-bot from marking the PR as stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0