tcorpus is a collection of high level tools for text corpus preparation and discourse analysis. It is being developed and used for research at the chair of Economic Geography and Sustainable Development, University of Freiburg. Things may change and break regularly, but you are welcome to see if any of it is useful.
The package relies on several dependencies for performing natural language processing tasks. Amongst other dependencies, it uses
- flair for named entity recognition
- syntok for segmentation and tokenization
- NLTK for parsing grammatical structures
While tcorpus is free to use and distribute under an MIT License, this may not be the case for all dependencies. Please consider if depedency licenses cover your use case.