data-ingestion

Here are 73 public repositories matching this topic...

bruin-data / ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated Jun 18, 2025
Python

paloaltodatabases / sequor

Star

Build complete API integrations with YAML and SQL. Rapid development without vendor lock-in and per-row costs.

sql etl data-engineering ipaas data-integration workflow-automation data-ingestion api-integration reverse-etl data-piplines app-integration sequor

Updated Jun 17, 2025
Python

aws-samples / amazon-kinesis-data-processor-aws-fargate

Star

Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate

containers amazon-kinesis data-ingestion data-processor amazon-kcl kinesis-data-streams scalable-data-stream

Updated Apr 15, 2025
Python

Dynatrace / OneAgent-SDK-for-Python

Star

Enables custom tracing of Python applications in Dynatrace

agent sdk apm dynatrace data-ingestion sdk-python oneagent dev-program

Updated Jun 17, 2025
Python

juansimon27 / scrapy-walmart

Star

Product scraping from Walmart Canada website, with further cleaning and integration of data from a different store.

python scrapy-spider webscraping data-ingestion sqlalchemy-python

Updated Dec 8, 2022
Python

apache / paimon-python

Star

Apache Paimon Python The Python implementation of Apache Paimon.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated May 22, 2025
Python

DeleLinus / HFR-Data-Warehousing

Star

End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow

Updated Sep 1, 2022
Python

nathadriele / airflow-kpi-insertion-pipeline

Star

The script automates the collection and insertion of KPIs related to transaction time and storage usage in a Data Warehouse, using Apache Airflow. It calculates the time elapsed since the last transaction and the percentage of storage usage, recording this data periodically in specific tables.

python-script data-warehouse data-engineering data-ingestion etl-pipeline airflow-dag transaction-metrics storage-usage data-kpi-scheduler kpi-data-pipeline

Updated Feb 3, 2025
Python

robert-koch-institut / mex-drop

Star

RKI Metadata Exchange | API and GUI micro service for distributing metadata items before it gets picked up by ETL-pipelines for further processing.

python data-ingestion research-data

Updated Jun 17, 2025
Python

AbstractionsLab / idps-escape

Star

IDPS-ESCAPE (Intrusion Detection and Prevention Systems for Evading Supply Chain Attacks and Post-compromise Effects), part of project CyFORT: open-source SOAR system powered by a deep learning-based anomaly detection toolbox (ADBox) and a risk-aware AD-based active response (RADAR) subsystem integrated with OSS such as Wazuh and Suricata

docker machine-learning correlation pandas python3 pytorch artificial-intelligence suricata intrusion-detection siem data-ingestion opensearch wazuh anomaly-detection soar multivariate-timeseries graph-attention-network idps mtad-gat

Updated Jun 18, 2025
Python

AdharshAla / covid19_bigdata_project

Star

Built real-time data streaming system using the Hadoop ecosystem, which will perform data extraction, data ingestion, data storage data retrieval, data transformation and data analysis in real time.

data-acquisition data-visualization hdp data-ingestion hortonworks-hdp

Updated Dec 10, 2020
Python

gsaraceno92 / data-ingestion

Star

Big data needs big help

python api data structured-data data-ingestion

Updated May 17, 2019
Python

longNguyen010203 / Finance-Data-Ingestion-Pipeline-with-Kafka

Star

Develop a real-time data ingestion pipeline using Kafka and Spark. Collect minute-level stock data from Yahoo Finance, ingest it into Kafka, and process it with Spark Streaming, storing the results in Cassandra. Orchestrated the workflow using Airflow deployed on Docker.

dockerfile airflow kafka cassandra etl docker-compose jupyter-notebook stream-processing spark-streaming financial-data data-ingestion etl-pipeline data-orchestration

Updated Nov 29, 2024
Python

PaulFInfoready / data-ingestion-script-generator-aka-frybot

Star

Python script to extract all .csv/.txt files from a specific AWS S3 bucket & generate the .sql scripts to ingest the files into a AWS Redshift database.

aws aws-s3 python3 redshift data-ingestion

Updated Aug 31, 2017
Python

daq-tools / skeem

Star

Infer SQL DDL statements from tabular data.

Updated Jun 17, 2025
Python

Jay-Paints / AdventureWorks-Sales-Analysis-with-SQL-Python-and-Excel

Star

This project involves analyzing AdventureWorks bike sales data to uncover key insights into sales performance by country, customer segments, and products. The findings informed strategies for targeted marketing, market expansion, promotional timing, and product quality improvements.

python data-visualization mysql-database data-analysis data-ingestion data-driven-decisions excel-report business-insights database-querying

Updated Aug 24, 2024
Python

AmadeusITGroup / docs2vecs

Star

CLI that helps with docs splitting, embedding and exposing them in a seamless manner

python docker natural-language-processing mongodb embeddings semantic-search data-ingestion cli-tool text-embedding document-processing rag azure-ai vector-database llm chromadb

Updated May 12, 2025
Python

jasontanx / gsheet-to-bq-ingestion

Star

Data ingestion from Google Sheet to BigQuery

bigquery data-engineering data-ingestion gsheets

Updated Feb 28, 2023
Python

pankajsingh09 / Data_Engineering_Using_AWS

Star

This Repository contains the contents related to Data Engineering Using AWS

python aws spark pipeline lambda-functions s3 pyspark data-ingestion dataengineering pycharm-ide event-bridge

Updated Feb 28, 2023
Python

karimosman89 / AI-Project

Star

This project, you will build a full AI pipeline for an image classification task using Convolutional Neural Networks (CNNs). The project will cover data ingestion, preprocessing, model training, deployment, and CI/CD integration using GitHub Actions, Docker, and AWS.

docker flask airflow deployment terraform kubernetes-cluster orchestration containerization data-ingestion ci-cd-pipeline mlflow model-training-and-evaluation

Updated Oct 1, 2024
Python

Improve this page

Add a description, image, and links to the data-ingestion topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-ingestion topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-ingestion

Here are 73 public repositories matching this topic...

bruin-data / ingestr

paloaltodatabases / sequor

aws-samples / amazon-kinesis-data-processor-aws-fargate

Dynatrace / OneAgent-SDK-for-Python

juansimon27 / scrapy-walmart

apache / paimon-python

DeleLinus / HFR-Data-Warehousing

nathadriele / airflow-kpi-insertion-pipeline

robert-koch-institut / mex-drop

AbstractionsLab / idps-escape

AdharshAla / covid19_bigdata_project

gsaraceno92 / data-ingestion

longNguyen010203 / Finance-Data-Ingestion-Pipeline-with-Kafka

PaulFInfoready / data-ingestion-script-generator-aka-frybot

daq-tools / skeem

Jay-Paints / AdventureWorks-Sales-Analysis-with-SQL-Python-and-Excel

AmadeusITGroup / docs2vecs

jasontanx / gsheet-to-bq-ingestion

pankajsingh09 / Data_Engineering_Using_AWS

karimosman89 / AI-Project

Improve this page

Add this topic to your repo