10000 feat: add sortable keys for record linkage by adamdecaf · Pull Request #654 · moov-io/watchman · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: add sortable keys for record linkage #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adamdecaf
Copy link
Member
@adamdecaf adamdecaf commented Jul 2, 2025

The idea is to generate a list of sortable keys (buckets the fields hash into) so that we can find records which are similar. You can do a multi-compare against these and grab rows which are greater/less than the keys to shrink the amount of detailed similarity scoring calls to make.

"TYPE:0230"
"NAME:0190"

// Country | Type | Identifier
"GOVID:C0173|T0190|X0146"

// Country | State | PostalCode | City | Line1 | Line2 [optional]
"ADDR:C0143|S0021|P0007|Y0023|L0201,0028,0173"

You could then compute some traditional string distance metrics over these sortable keys to rank what's most similar. The keys move from general data to more specific.

With broad fields on the left this allows for prefix filtering in SQL. You could strip out Line1/Line2 data and filter down to a city level. Or find the rows nearby to an exact address by grabbing those greater and less than the target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0