8000 feat(override): allow addendum data to override imported place data by blackmad · Pull Request #1494 · pelias/api · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(override): allow addendum data to override imported place data #1494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 14, 2020

Conversation

blackmad
Copy link
Contributor

This is part of a workaround for the issues we've seen in pelias with scoring on records that have multiple names. The issue is that elasticsearch calculates the term frequency over all the names, so a place like ["Sutter St Cafe", "The Sutter St Cafe"] is more likely to match "Sutter St" than a record with just the name ["Sutter St"]

This change allows us to create multiple records, one per name, but to specify in the addendum data what the canonical name (or any other data) is, so that we can generate lots of aliases for records and not worry too much about the term frequency scoring issues, such as adding "Restaurant" to every restaurant venue in our data, or aliasing "subway" and "sub way".

It's hard to say what the impact of this is - in our pelias set up we are generating aliases outside of pelias trunk, on commercial data. Without this change, creating aliases (like turning all "AA" tokens into "A A", generating compound/decompound mappings for words like coffeeshop<->coffee shop) we were seeing some regressions when only creating one record with multiple names, due to repeated tokens. With this change in approach, we are not seeing any regressions.

I would like to get this change into pelias master (rather than applying this hack at our application server level) so that I can continue performing side-by-side evaluations at the pelias api level on new indexes and accurately seeing what we are trying to show to users.

I am open to other approaches to this.

David Blackman added 2 commits October 5, 2020 12:56
This is a workaround for the issues we've seen in pelias with scoring on records that have multiple names. The issue is that elasticsearch calculates the term frequency over all the names, so a place like ["Sutter St Cafe", "The Sutter St Cafe"] is more likely to match "Sutter St" than a record with just the name ["Sutter St"]

This change allows us to create multiple records, one per name, but to specify in the addendum data what the canonical name is, so that we can generate lots of aliases for records and not worry too much about the term frequency scoring issues, such as adding "Restaurant" to every restaurant venue in our data, or aliasing "subway" and "sub way".

I am open to other approaches to this.
if (place.addendum && place.addendum.override) {
try {
const overrideData = codec.decode(place.addendum.override);
place = _.merge(place, overrideData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is very powerful because it bypasses all the setter logic baked into pelias/model which ensures validity and type correctness.

It would be fairly easy to do something silly like put a string where an array was expected for vice versa which would cause a runtime error and return a HTTP 500

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. I'm not sure how I could easily reuse the pelias-model logic it seems like it doesn't have a concept of loading/validating an object, so I don't know how I could use it at import or serving time to guarantee correctness.

Another approach would be to add a "displayName" property to pelias-model which seems almost more fragile since there would be so many places to decide if I should use name or displayName.

Do you have any better ideas on how this could be implemented?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orangejulius and I just discussed this in our call today.

It's a very powerful tool, so we're happy to merge this as an undocumented feature with the disclaimer that it can break things and errors made in the addendum will require a re-index to fix.

With that in mind I prefer the flexibility of addendum.override than displayName as it potentially allows prototyping other features in the future without changing code.

Copy link
Member
@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍
It's super useful when required but also a no-op when addendum.override isn't specified.

@missinglink missinglink merged commit b57bbd4 into master Oct 14, 2020
@missinglink missinglink deleted the apply-overrides branch October 14, 2020 08:05
missinglink pushed a commit that referenced this pull request Oct 15, 2020
…1494)

* proof of concept for having addendums that override indexed fields

* feat(override): allow addendum data to override imported place data

* use _.has
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0