Extractous

Extractous offers a unified approach for detecting and extracting metadata and text content from various documents types such as PDF, Word, HTML, and many other formats. Our goal is to deliver an efficient comprehensive solution with bindings for many programming languages.

Why Extractous?

Extractous was mainly inspired by the Unstructured Python library. While Unstructured offers a good solution for parsing unstructured content, we see 2 main issues with it:

Performance: data processing is mainly a cpu-bound problem and Python is not the best choice for such tasks because of its Global Interpreter Lock (GIL) which makes it hard to utilize multiple cores.
Unstructured is becoming more of an LLM framework rather than just text and metadata parsing library.

Extractous will focus only on the text and metadata extraction part. The core is written in Rust, leveraging its memory safety, multithreading and zero cost abstractions. Extractous will provide bindings for many programming languages.

Features

Clear simple API for extracting text and metadata content.
Support for many file formats.
Strives to be efficient and fast.
Comprehensive documentation and examples to help you get started quickly.

Bindings

Name	Release
Rust Core
Pytho 655E n Binding

Supported file formats

File Format	Rust Core	Python Binding
pdf	✅	✅
csv	✅	✅

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
.github/workflows		.github/workflows
bindings/extractous-python		bindings/extractous-python
extractous-core		extractous-core
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extractous

Why Extractous?

Features

Bindings

Supported file formats

About

Uh oh!

Releases

Packages

Languages

License

yutannihilation/extractous

Folders and files

Latest commit

History

Repository files navigation

Extractous

Why Extractous?

Features

Bindings

Supported file formats

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages