I'm a Data Engineer passionate about designing and optimizing robust, scalable data pipelines. My work spans end-to-end ETL processes, advanced data transformation, geospatial processing, cloud integrations, and machine learning projects—combining real-world data challenges with cutting-edge analytical techniques.
- End-to-End ETL Development:
- Designed and led projects that extract data from APIs and AWS S3 (with pagination and validation), transform data with custom logic, and load it into PostgreSQL.
- Automation & Performance Optimization:
- Automated ETL processes for multi-state oil & gas production datasets using Python, Pandas, and Apache Spark, reducing processing time by 60%.
- Cloud Integration & Data Migration:
- Utilized AWS services such as S3, EC2, Glue, Lambda, ECS, MWAA, RDS, and Secrets Manager to optimize data pipelines.
- Migrated legacy data from MongoDB to PostgreSQL while ensuring data integrity and performance improvements.
- Complex Data Transformation:
- Developed robust logic to handle date formatting, JSON-style columns, missing values, and coordinate validations.
- Directional Survey Processing:
- Built scripts for merging CSV files, applying cubic spline interpolation, and performing coordinate transformations.
- Leveraged Pyproj and Shapely to convert wellbore survey data into precise 2D/3D LineString geometries and organized outputs in AWS S3 for efficient workflows.
- Real-Time Data Integration:
- Integrated REST APIs to retrieve and process real-time permit and well data from various state energy departments.
- Secure Cloud Architectures:
- Configured EC2 Security Groups, VPC settings, and IAM roles (following the least privilege principle) to safeguard communications between services.
- CloudWatch Alerts for Slack – Database Monitor:
- Developed an integration connecting AWS CloudWatch with Slack to monitor critical database metrics.
- Configured CloudWatch to track performance indicators (like CPU usage, memory, slow query, IOPS) and trigger alerts via AWS Lambda and SNS, enabling proactive incident response.
- Model Development & Evaluation:
- Completed multiple academic projects involving classification, regression, clustering, and neural networks.
- Utilized popular ML libraries (scikit-learn, TensorFlow, PyTorch) for model building, feature engineering, hyperparameter tuning, and evaluation.
- End-to-End Workflows:
- Conducted comprehensive exploratory data analysis (EDA), data preprocessing, model training, and deployment—demonstrating practical ML solutions across diverse domains.
- Programming & Scripting: Python, SQL
- Data Processing & Analytics: Pandas, Apache Spark, Polar
- Cloud Platforms: AWS (S3, EC2, Glue, Redshift, MWAA, RDS, Lambda, ECS, Secrets Manager, CloudWatch)
- Database Systems: PostgreSQL, MongoDB
- Geospatial Tools: Pyproj, Shapely
- Workflow Orchestration: Apache Airflow
- Machine Learning & Deep Learning: scikit-learn, TensorFlow, PyTorch, OpenCV
- Alerting & Monitoring: AWS Lambda, SNS, Slack Webhooks
- LinkedIn: LinkedIn URL
Thank you for stopping by my profile! I'm always excited to connect, collaborate, and explore innovative solutions in data engineering and machine learning.