-
Notifications
You must be signed in to change notification settings - Fork 561
I want to add a metric type for sum distinct
#4329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going! |
Is this issue still relevant? There have been no updates for 60 days, please close the issue or keep the conversation going! |
Just requested by a partner. |
I'd love to request this feature as well! |
this would be such a great addition! |
Very relevant, I want it too ! |
@TuringLovesDeathMetal told me a workaround to calculate something like https://cloud.google.com/looker/docs/best-practices/understanding-symmetric-aggregates TL;DRI figured out that the symmetric aggregation on Lightdash with BigQuery is difficult, because overflow happens when summing the values. Background
What I triedI tried the symmetric aggregation with the Step 1: Basic ImplementationI started with the following basic implementation. meta:
metrics:
sum_distinct_order_amount:
type: number
sql: |
${sum_distinct_order_amount_helper1} - ${sum_distinct_order_amount_helper2}
sum_distinct_order_amount_helper1:
type: sum
hidden: true
sql: |
DISTINCT(${amount} + FARM_FINGERPRINT(${order_id}))
sum_distinct_order_amount_helper2:
type: sum
hidden: true
sql: |
DISTINCT(FARM_FINGERPRINT(${order_id})) Step 2: Casting to NUMERICSo, I tried casting the values to NUMERIC to avoid the overflow issue. According to my research, the value of the meta:
metrics:
sum_distinct_order_amount:
type: number
sql: |
${sum_distinct_order_amount_helper1} - ${sum_distinct_order_amount_helper2}
sum_distinct_order_amount_helper1:
type: sum
hidden: true
sql: |
DISTINCT(${amount} + SAFE_CAST(FARM_FINGERPRINT(${order_id}) AS NUMERIC))
sum_distinct_order_amount_helper2:
type: sum
hidden: true
sql: |
DISTINCT(SAFE_CAST(FARM_FINGERPRINT(${order_id}) AS NUMERIC)) Step 3: Using smaller hash valuesI assumed that the value of the However, if the results by dividing the values are not sure to unique meta:
metrics:
sum_distinct_order_amount:
type: number
sql: |
${sum_distinct_order_amount_helper1} - ${sum_distinct_order_amount_helper2}
sum_distinct_order_amount_helper1:
type: sum
hidden: true
sql: |
DISTINCT(${amount} + SAFE_CAST(FARM_FINGERPRINT(${order_id}) AS NUMERIC) / 123456789.0)
sum_distinct_order_amount_helper2:
type: sum
hidden: true
sql: |
DISTINCT(SAFE_CAST(FARM_FINGERPRINT(${order_id}) AS NUMERIC) / 123456789.0) |
This issue was mentioned by a user in slack: https://lightdash.slack.com/archives/C04KD95TF0U/p1725476387180909?thread_ts=1725476387.180909&cid=C04KD95TF0U |
This issue was mentioned by a user in slack: https://lightdash-community.slack.com/archives/C03MG2VTR08/p1727705443187999?thread_ts=1727705443.187999&cid=C03MG2VTR08 |
This issue was mentioned by a user in slack: https://lightdash-community.slack.com/archives/C067S48BHRP/p1728016581620219?thread_ts=1728016581.620219&cid=C067S48BHRP |
A workaround to virtually calculate sum distinct count in LightdashI figured out a workaround to virtually implement sum distinct count in Lightdash with BigQuery. It isn't a perfect solution, as there might be some limitation. For instance, filtering by element(s) in the unnested field doesn't work. BigQuery offers
There are two approaches to add a dimension with -- The original data doesn't have duplicated rows
WITH jobs AS (
SELECT
1 AS job_id,
TIMESTAMP('2024-01-01T00:00:00Z') AS creation_time,
ARRAY<STRING>['a', 'b'] AS tags,
UNION ALL
SELECT
2 AS job_id,
TIMESTAMP('2024-01-02T00:00:00Z') AS creation_time,
ARRAY<STRING>['a', 'c'] AS tags,
UNION ALL
SELECT
3 AS job_id,
TIMESTAMP('2024-01-03T00:00:00Z') AS creation_time,
ARRAY<STRING>[] AS tags,
)
-- We have to expand the ARRAY field into separate rows by the tag.
-- That means we have to create duplicate rows for the same job_id.
SELECT
ROW_NUMBER() OVER (PARTITION BY jobs.job_id ORDER BY jobs.creation_time) AS job_id_row_number,
jobs.job_id,
jobs.creation_time,
tag,
FROM jobs AS jobs
LEFT JOIN UNNEST(jobs.tags) AS tag columns:
- name: job_id_row_number
meta:
dimension:
type: number
label: job_id_row_number |
This issue was mentioned by a user in slack: https://lightdash-community.slack.com/archives/C067S48BHRP/p1737388379511399?thread_ts=1737388379.511399&cid=C067S48BHRP |
This issue was mentioned by a user in slack: https://lightdash.slack.com/archives/C08RYCQJJCV/p1747079419585859?thread_ts=1747079419.585859&cid=C08RYCQJJCV |
Uh oh!
There was an error while loading. Please reload this page.
Description
I want to be able to create a sum of values that are based on a distinct list of items.
For example, I have a table like this:

I want to create a sum of all item costs, based on the distinct item number. So, my
sum_distinct
for this table would be 15.I would have to define the field I wanted to be
distinct
on for the metric, then define the dimension I'm summing. So the .yml could look something like this:The SQL for this metric would look something like this (we'd probably need to use hashing functions):
Acceptance criteria
sum distinct
metrics in my .ymldistinct_key
valuesdistinct_key
can take a list or a single valueThe text was updated successfully, but these errors were encountered: