8000 handle duplicate features consistently across ML implementations · Issue #348 · ClearTK/cleartk · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

handle duplicate features consistently across ML implementations #348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bethard opened this issue Apr 15, 2015 · 2 comments
Open

handle duplicate features consistently across ML implementations #348

bethard opened this issue Apr 15, 2015 · 2 comments

Comments

@bethard
Copy link
Contributor
bethard commented Apr 15, 2015

Original issue 350 created by ClearTK on 2013-03-01T09:24:48.000Z:

As discussed on the mailing list, different feature encoders do different things when encountering duplicate features:

https://groups.google.com/d/topic/cleartk-users/B2cfZSUX7W0/discussion

For example, FeatureVectorFeaturesEncoder adds together the counts for identical feature names,
NameNumberFeaturesEncoder produces duplicate NameNumber pairs, and FeatureNodeArrayEncoder throws away all but the last value.

All the feature encoders should do the same thing. A few options:

  • Add values together, as in, FeatureVectorFeaturesEncoder, though this doesn't make much sense for Boolean valued features
  • Throw an exception, requiring the annotator to de-duplicate. This might be conceptually the simplest thing to do, but might require substantially more work from the annotator.

In addition to true duplicates, we also need to figure out what we should do when two features with the same name but different values are given.

@bethard
Copy link
Contributor Author
bethard commented Apr 15, 2015

Comment #1 originally posted by ClearTK on 2013-05-03T08:44:33.000Z:

<empty>

@bethard
Copy link
Contributor Author
bethard commented Apr 15, 2015

Comment #2 originally posted by ClearTK on 2014-03-15T17:41:52.000Z:

<empty>

@bethard bethard modified the milestone: 2.2 Apr 16, 2015
@reckart reckart modified the milestones: 3.0.0, ⭐️ Feature backlog Nov 4, 2022
@reckart reckart added 🐛 Bug Something isn't working and removed Type-Defect labels Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0