8000 Consecutive Codes · Issue #3 · libindic/soundex · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Consecutive Codes #3
Open
Open
@chancyk

Description

@chancyk

Consecutive codes may not be handled correctly, as can be seen with the test cases Pfister and Tymczak referenced at http://www.archives.gov/research/census/soundex.html.

The original Russell and census versions of the algorithm seem to implement this consecutive code behavior for adjacent letters only (not separated by a vowel or '0' code character).

The archives.gov reference also mentions another special case where a consecutive code is discarded when separated by an 'H' or 'W'.

EDIT: The 'H' or 'W' rule actually is used in the SQL Server implementation. Removed the comment that it's not.

EDIT2: I was right and wrong before my first edit. MSSQL is case sensitive for its handling of 'H' and 'W'. Consecutive codes are discarded for upper case and not for lower case...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0