A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
-
Updated
Jun 21, 2025 - Python
8000
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
A simple resume parser used for extracting information from resumes
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.
extract data from html table
Extract colors from an image. Colors are grouped based on visual similarities using the CIE76 formula.
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
Unofficial Python client for Twitter
Extract audio and other data from the Digitech Trio Plus guitar pedal's SD card
Extract structured data from any unstructured web page
A simple UI tool to batch crop images to prepare datasets from images and videos.
Different python utility scripts to help automate mundane/repetitive tasks. Useful for performance testers/data scientist or anyone who wants to automate mundane tasks in python.
A Python module for reading data from a plot provided as SVG file.
Extract data from Octopus mdict (*.mdd, *.mdx) files
This is a library for making batch request to Google Analytics Core Reporting v3 API and extracting data from Google Analytics property into Python 3 data structures.
A toolkit for extracting elements and visualization for Waymo Open Dataset
A tool designed to extract numerical data from scanned historical weather documents.
Singer Tap for dbt API v2 built with the Meltano SDK
Add a description, image, and links to the extract-data topic page so that developers can more easily learn about it.
To associate your repository with the extract-data topic, visit your repo's landing page and select "manage topics."