8000 GitHub - simomarsili/pcdhit: Python interface to cd-hit
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

simomarsili/pcdhit

Repository files navigation

pcdhit

Python interface to the cd-hit clustering program. Given a collection of redundant sequence records, (an iterable of (header, sequence) tuples) and a sequence identity threshold, the filter function returns an iterable of non-redundant records:

>>> import pcdhit
>>> filtered_records = pcdhit.filter(records, thr=0.9)

In practice, the function dumps the redundant records in a fasta alignment, run cd-hit:

$ cdhit -i <redundant_fasta> -o <non_redundant_fasta> -c <threshold>

via subprocess.Popen and parse the non-redundant records from the cd-hit output file.

About

Python interface to cd-hit

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

0