-
CatFix Technology
- Dallas
-
19:24
(UTC -05:00) - www.johnshelburne.com
- @thecatfix
- @johnshelburne.com
- https://gist.github.com/thecatfix
- https://bento.me/john-shelburne
Highlights
- Pro
Data Extraction
Easy access to IAB Tech Lab taxonomies, including Content, Audience and Ad Product
A demo Jupyter Notebook showcasing a simple local RAG (Retrieval Augmented Generation) pipeline to chat with your PDFs.
wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.
DuckDB is an analytical in-process SQL database management system
SemanticPDF: Drag, Drop, Semantic Search - SemanticPDF is a simple, privacy-focused application that makes it easy to upload a PDF file and perform a semantic search on contents.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Code I wrote for my AI & LLM workshops
Export/Backup Spotify playlists using the Web API
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
🎓 Practical beginner-level introductions to using different tools and technologies, with a focus on their application in the newsroom
A network filesystem client to connect to SSH servers
A time-series database for high-performance real-time analytics packaged as a Postgres extension
Data files (.csv) accessed with nflscrapR and summarized at the player-level
hudi-packages-connectors is a library that provides a toolset to parse and extract relevant information from the personal data sources provided by major websites or social networks.
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Public resources related to Thinkful's data science bootcamp
using XPDF, pdftojson extracts text from PDF files as JSON, including word bounding boxes.
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
Swiss-army tool for scraping and extracting data from online assets, made for hackers
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate wordlists that can be utilized by offensive security tools to perform brute force, forced browsing,…