8000 Identify duplicates / similar learning resources · Issue #882 · mitodl/lore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Jan 28, 2020. It is now read-only.
This repository was archived by the owner on Jan 28, 2020. It is now read-only.
Identify duplicates / similar learning resources #882
Open
@pdpinch

Description

@pdpinch

As a curator, I would like a way to easily identify and hide (or remove) duplicate learning resources from the repository.

Having imported several versions of the 8.01 physics course, we already have many duplicate learning objects cluttering the repository. It would be good to have a way to identify them (automatically, or by user inspection) and hide (or remove) them in oder to declutter the interface.

Some thoughts that have been discussed:

  • create a vocabulary for specifying the relationship between learning objects, e.g. duplicate, version, etc. There is undoubtedly prior art on this
  • develop a heuristic for identifying duplicates on import, and tag them with said vocabulary
  • give users a way to manually tag related learning objects, for when the automation fails
  • elasticsearch may help by giving similarity scores for documents.
  • when duplicates or versions are identified, there should be a way to synchronize the metadata between the two, to avoid re-entry

One tricky aspect of this is that some minor differences between versions of a learning resource may be considered irrelevant and they can be thought of as duplicates. Other small changes may be significant, like changes to problem text.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0