8000 Bug: When filtering by all time on a Data Warehouse table the startdate is set to the first event in the events table · Issue #33695 · PostHog/posthog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Bug: When filtering by all time on a Data Warehouse table the startdate is set to the first event in the events table #33695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
darkopia opened this issue Jun 13, 2025 · 2 comments
Labels
bug Something isn't working right feature/insights Feature Tag: Insights overall team/data-warehouse team/product-analytics

Comments

@darkopia
Copy link
Contributor
darkopia commented Jun 13, 2025

Bug Description

Bug description

Please describe.

from typing import Any, Optional, Union, overload
based on what I am reading here to get the first event for an "All events" query we get the first event from the events table, this is problematic when we are looking at data warehouse tables which might have data with timestamps before the first event.

If you are on PostHog Cloud it would be really valuable if you can share any links where the problem occurs. This speeds up our ability to troubleshoot tremendously.

How to reproduce

  1. Import historic data into data warehouse tables
  2. Set "All time" filter against a data warehouse table
  3. Use "last 10 years filter
  4. If there are timestamps in the data warehouse table from before the first event was captured in the events table, the all filter will show fewer events than last 10 years filter.

Additional context

From: https://posthoghelp.zendesk.com/agent/tickets/32463

Debug info

Kind: bug

Target area: analytics

Report event: http://go/ticketByUUID/998de076-ff98-4080-a7f8-b92ef8a54ec5

Session: https://us.posthog.com/project/sTMFPsFhdP1Ssg/replay/0197646a-8380-717f-bcc0-859881b83df3?t=260

Exceptions: https://us.posthog.com/project/2/error_tracking?filterGroup=%7B%22type%22%3A%22AND%22%2C%22values%22%3A%5B%7B%22type%22%3A%22AND%22%2C%22values%22%3A%5B%7B%22key%22%3A%22%24session_id%22%2C%22value%22%3A%5B%220197646a-8380-717f-bcc0-859881b83df3%22%5D%2C%22operator%22%3A%22exact%22%2C%22type%22%3A%22event%22%7D%5D%7D%5D%7D

Location: https://us.posthog.com/project/103405/pipeline/sources/managed-01961b10-d956-0000-bf76-93529a719180/schemas

Persons-on-events mode for project: person_id_override_properties_on_events
@darkopia darkopia added bug Something isn't working right team/product-analytics feature/insights Feature Tag: Insights overall labels Jun 13, 2025
@aspicer
Copy link
Contributor
aspicer commented Jun 16, 2025

Every time we do an "all time" query, we run a query for get_earliest_timestamp, which runs
SELECT timestamp from events WHERE team_id = %(team_id)s AND timestamp > 2015-01-01 order by timestamp limit 1

This runs quite often, a 8CDE nd represents 1% of query time on our clickhouse cluster. It also has the issue of not working for data warehouse queries, because it only returns when the first event happens.

The naive way to fix this would be to modify the get_earliest_timestamp function to be able to handle this for data warehouse tables (and arbitary timestamp fields). How would the performance of that look for for data warehouse tables are stored?

@phixMe
Copy link
Contributor
phixMe commented Jun 16, 2025

Our data warehouse tables map to parquet files via ClickHouse table functions which are columnar in nature. I took a peek at the parquet metadata for a timestamp column and this column does indeed have the min/max directly stored in it. My understanding is that ClickHouse should be able to do this w/o a full table scan.

...
{
  "PathInSchema": [
    "Created_at"
  ],
  "Type": "INT64",
  "Encodings": [
    "PLAIN",
    "RLE",
    "RLE_DICTIONARY"
  ],
  "CompressedSize": 111,
  "UncompressedSize": 108,
  "NumValues": 5,
  "NullCount": 0,
  "MaxValue": 1744047287878634,
  "MinValue": 1741640068000000,
  "CompressionCodec": "SNAPPY"
}
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right feature/insights Feature Tag: Insights overall team/data-warehouse team/product-analytics
Projects
None yet
Development

No branches or pull requests

3 participants
0