feat(override): allow addendum data to override imported place data #1494

blackmad · 2020-10-12T12:29:39Z

This is part of a workaround for the issues we've seen in pelias with scoring on records that have multiple names. The issue is that elasticsearch calculates the term frequency over all the names, so a place like ["Sutter St Cafe", "The Sutter St Cafe"] is more likely to match "Sutter St" than a record with just the name ["Sutter St"]

This change allows us to create multiple records, one per name, but to specify in the addendum data what the canonical name (or any other data) is, so that we can generate lots of aliases for records and not worry too much about the term frequency scoring issues, such as adding "Restaurant" to every restaurant venue in our data, or aliasing "subway" and "sub way".

It's hard to say what the impact of this is - in our pelias set up we are generating aliases outside of pelias trunk, on commercial data. Without this change, creating aliases (like turning all "AA" tokens into "A A", generating compound/decompound mappings for words like coffeeshop<->coffee shop) we were seeing some regressions when only creating one record with multiple names, due to repeated tokens. With this change in approach, we are not seeing any regressions.

I would like to get this change into pelias master (rather than applying this hack at our application server level) so that I can continue performing side-by-side evaluations at the pelias api level on new indexes and accurately seeing what we are trying to show to users.

I am open to other approaches to this.

This is a workaround for the issues we've seen in pelias with scoring on records that have multiple names. The issue is that elasticsearch calculates the term frequency over all the names, so a place like ["Sutter St Cafe", "The Sutter St Cafe"] is more likely to match "Sutter St" than a record with just the name ["Sutter St"] This change allows us to create multiple records, one per name, but to specify in the addendum data what the canonical name is, so that we can generate lots of aliases for records and not worry too much about the term frequency scoring issues, such as adding "Restaurant" to every restaurant venue in our data, or aliasing "subway" and "sub way". I am open to other approaches to this.

test/unit/middleware/parseBBox.js

middleware/applyOverrides.js

missinglink · 2020-10-12T14:03:02Z

middleware/applyOverrides.js

+  if (place.addendum && place.addendum.override) {
+    try {
+      const overrideData = codec.decode(place.addendum.override);
+      place = _.merge(place, overrideData);


This command is very powerful because it bypasses all the setter logic baked into pelias/model which ensures validity and type correctness.

It would be fairly easy to do something silly like put a string where an array was expected for vice versa which would cause a runtime error and return a HTTP 500

That's true. I'm not sure how I could easily reuse the pelias-model logic it seems like it doesn't have a concept of loading/validating an object, so I don't know how I could use it at import or serving time to guarantee correctness.

Another approach would be to add a "displayName" property to pelias-model which seems almost more fragile since there would be so many places to decide if I should use name or displayName.

Do you have any better ideas on how this could be implemented?

@orangejulius and I just discussed this in our call today.

It's a very powerful tool, so we're happy to merge this as an undocumented feature with the disclaimer that it can break things and errors made in the addendum will require a re-index to fix.

With that in mind I prefer the flexibility of addendum.override than displayName as it potentially allows prototyping other features in the future without changing code.

middleware/applyOverrides.js

missinglink

Looks good to me 👍
It's super useful when required but also a no-op when addendum.override isn't specified.

…1494) * proof of concept for having addendums that override indexed fields * feat(override): allow addendum data to override imported place data * use _.has

David Blackman added 2 commits October 5, 2020 12:56

proof of concept for having addendums that override indexed fields

5f98ca8

blackmad requested review from orangejulius and missinglink October 12, 2020 12:29

missinglink reviewed Oct 12, 2020

View reviewed changes

use _.has

65cd351

missinglink reviewed Oct 8000 12, 2020

View reviewed changes

middleware/applyOverrides.js Show resolved Hide resolved

missinglink approved these changes Oct 12, 2020

View reviewed changes

missinglink merged commit b57bbd4 into master Oct 14, 2020

missinglink deleted the apply-overrides branch October 14, 2020 08:05

missinglink mentioned this pull request May 14, 2025

add applyOverrides middleware to place endpoint #1695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(override): allow addendum data to override imported place data #1494

feat(override): allow addendum data to override imported place data #1494

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(override): allow addendum data to override imported place data #1494

feat(override): allow addendum data to override imported place data #1494

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!