10000 GitHub - icd-codex/icd-codex: python library for graphical and continuous representations of ICD9 and ICD10 codes
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

icd-codex/icd-codex

Repository files navigation

PyPI version fury.io Documentation Status Downloads DOI

Thank you for your interest in ICD codex! I (Jeremy) wrote this in 2020 as part of class project and it has gotten quite a few downloads. However, since then, sequence representation has significantly improved. At this point, **I would not recommend using node2vec to represent ICD codes.** Instead, use a [language model](https://platform.openai.com/docs/guides/embeddings). The node2vec functionality is provided for compatibility with existing projects.

The `networkx` hierarchy remains useful for your modeling requirements.

If there is interest in extending this library for use with modern sequence learning algorithms, please reach out.

What is it?

A python library for building vector representations of ICD-9 and ICD-10 codes. (2025 comment: the vector representations here are constructed using outdated algorithms.) Because it takes advantage of the hierarchical nature of ICD codes, it also provides these hierarchies in a networkx format. (2025 comment: this data structure should still remain useful.)

Motivation

icdcodex was the first prize winner in the Data Driven Healthcare Track of John Hopkins' MedHacks 2020. It was hacked together to address the problem of ICD miscodes, which is a major issue for health insurance in the United States. Indeed, while ICD coding is tedious and labour intensive, it is not obvious how to automate because the output space is enourmous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing.

There are many strategies for target encoding that address these issues. icdcodex has two features that make ICD classification more amenable to modeling:

  • Access to a networkx tree representation of the ICD-9 and ICD-10 hierarchies
  • Vector embeddings of ICD codes using the node2vec algorithm (including pre-computed embeddings and an interface to create new embeddings)

Example Code

from icdcodex import icd2vec, hierarchy
embedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)
embedder.fit(*hierarchy.icd9())
X = get_patient_covariates()
y = embedder.to_vec(["0010"])  # Cholera due to vibrio cholerae

In this case, y is a 64-dimensional vector close to other Infectious And Parasitic Diseases codes.

Related Work

The Hackathon Team

  • Jeremy Fisher (Maintainer)
  • Alhusain Abdalla
  • Natasha Nehra
  • Tejas Patel
  • Hamrish Saravanakumar

Documentation

See the full documentation: https://icd-codex.readthedocs.io/en/latest/

Contributions

Contributions are always welcome!

About

python library for graphical and continuous representations of ICD9 and ICD10 codes

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  
0