ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
-
Updated
Jun 18, 2025 - Python
8000
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Build complete API integrations with YAML and SQL. Rapid development without vendor lock-in and per-row costs.
Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate
Enables custom tracing of Python applications in Dynatrace
Product scraping from Walmart Canada website, with further cleaning and integration of data from a different store.
Apache Paimon Python The Python implementation of Apache Paimon.
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
The script automates the collection and insertion of KPIs related to transaction time and storage usage in a Data Warehouse, using Apache Airflow. It calculates the time elapsed since the last transaction and the percentage of storage usage, recording this data periodically in specific tables.
RKI Metadata Exchange | API and GUI micro service for distributing metadata items before it gets picked up by ETL-pipelines for further processing.
IDPS-ESCAPE (Intrusion Detection and Prevention Systems for Evading Supply Chain Attacks and Post-compromise Effects), part of project CyFORT: open-source SOAR system powered by a deep learning-based anomaly detection toolbox (ADBox) and a risk-aware AD-based active response (RADAR) subsystem integrated with OSS such as Wazuh and Suricata
Built real-time data streaming system using the Hadoop ecosystem, which will perform data extraction, data ingestion, data storage data retrieval, data transformation and data analysis in real time.
Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.
Python script to extract all .csv/.txt files from a specific AWS S3 bucket & generate the .sql scripts to ingest the files into a AWS Redshift database.
Infer SQL DDL statements from tabular data.
This project involves analyzing AdventureWorks bike sales data to uncover key insights into sales performance by country, customer segments, and products. The findings informed strategies for targeted marketing, market expansion, promotional timing, and product quality improvements.
CLI that helps with docs splitting, embedding and exposing them in a seamless manner
Data ingestion from Google Sheet to BigQuery
This Repository contains the contents related to Data Engineering Using AWS
This project, you will build a full AI pipeline for an image classification task using Convolutional Neural Networks (CNNs). The project will cover data ingestion, preprocessing, model training, deployment, and CI/CD integration using GitHub Actions, Docker, and AWS.
Add a description, image, and links to the data-ingestion topic page so that developers can more easily learn about it.
To associate your repository with the data-ingestion topic, visit your repo's landing page and select "manage topics."