-
Notifications
You must be signed in to change notification settings - Fork 0
Seqs
Seq (SEEK) is short for "sequence", our technical term for a sequence of characters. For instance, "asdfs" is a seq. It's not a word, but it is a seq. Why does this matter? Why the hell does this get a header? Why are you making up terms? Well, it all boils down to a lot of the conversations we've had about making the system flexible, and supporting different spellings, regions, etc.
In v1 of Wordset, a "word" has a single "headword", which is the official dictionary term for the primary spelling of a word. It's the thing you look up. For instance, in an American dictionary, looking up "color", you'd see "color" in bold, and that spelling is the headword. Then, below, might be a note about commonwealth spelling of "colour". Well, as a multi-dialectal team, we certainly don't like headwords to be like that. The headword is a required cheat in a traditional dictionary, but we don't have those limitations.
So, imagine seqs are every possible combination of letters... "collor" "color" and "colour" are all in there. "collor" isn't attached to any wordset, but "color" and "colour" both are attached to the same wordset, both being tagged with regionality (see the later discussions of this). Also, notice that we're using the term "wordset" instead of "word". "Word" seems pretty useless once you think about it. Is it a spelling? Is it a meaning? Hence, we are calling collections of meanings, "wordsets".
Also, why technically include misspellings as Seqs? Why not just call it Spellings? Because, we do actually want to deal with realistic usage of language, and in some ways, that actually includes common typos, misspellings, etc. We still want Wordset to primarily represent actual meaningful attempts at real communication, but that doesn't mean that we should pretend that misspellings totally don't exist, especially in our fundamental data structures. This is Computer Science, people! :)