8000 ErRsah (Roshan Sah) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View ErRsah's full-sized avatar

Block or report ErRsah

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ErRsah/README.md

Hi, I'm Roshan Sah! 👋

I'm a Data Engineer passionate about designing and optimizing robust, scalable data pipelines. My work spans end-to-end ETL processes, advanced data transformation, geospatial processing, cloud integrations, and machine learning projects—combining real-world data challenges with cutting-edge analytical techniques.


🔧 Expertise & Experience

Data Engineering & ETL Pipelines

  • End-to-End ETL Development:
    • Designed and led projects that extract data from APIs and AWS S3 (with pagination and validation), transform data with custom logic, and load it into PostgreSQL.
  • Automation & Performance Optimization:
    • Automated ETL processes for multi-state oil & gas production datasets using Python, Pandas, and Apache Spark, reducing processing time by 60%.
  • Cloud Integration & Data Migration:
    • Utilized AWS services such as S3, EC2, Glue, Lambda, ECS, MWAA, RDS, and Secrets Manager to optimize data pipelines.
    • Migrated legacy data from MongoDB to PostgreSQL while ensuring data integrity and performance improvements.

Advanced Data Transformation & Geospatial Processing

  • Complex Data Transformation:
    • Developed robust logic to handle date formatting, JSON-style columns, missing values, and coordinate validations.
  • Directional Survey Processing:
    • Built scripts for merging CSV files, applying cubic spline interpolation, and performing coordinate transformations.
    • Leveraged Pyproj and Shapely to convert wellbore survey data into precise 2D/3D LineString geometries and organized outputs in AWS S3 for efficient workflows.

Cloud-Based Monitoring & Security

  • Real-Time Data Integration:
    • Integrated REST APIs to retrieve and process real-time permit and well data from various state energy departments.
  • Secure Cloud Architectures:
    • Configured EC2 Security Groups, VPC settings, and IAM roles (following the least privilege principle) to safeguard communications between services.
  • CloudWatch Alerts for Slack – Database Monitor:
    • Developed an integration connecting AWS CloudWatch with Slack to monitor critical database metrics.
    • Configured CloudWatch to track performance indicators (like CPU usage, memory, slow query, IOPS) and trigger alerts via AWS Lambda and SNS, enabling proactive incident response.

Academic Machine Learning Projects

  • Model Development & Evaluation:
    • Completed multiple academic projects involving classification, regression, clustering, and neural networks.
    • Utilized popular ML libraries (scikit-learn, TensorFlow, PyTorch) for model building, feature engineering, hyperparameter tuning, and evaluation.
  • End-to-End Workflows:
    • Conducted comprehensive exploratory data analysis (EDA), data preprocessing, model training, and deployment—demonstrating practical ML solutions across diverse domains.

🚀 Skills & Technologies

  • Programming & Scripting: Python, SQL
  • Data Processing & Analytics: Pandas, Apache Spark, Polar
  • Cloud Platforms: AWS (S3, EC2, Glue, Redshift, MWAA, RDS, Lambda, ECS, Secrets Manager, CloudWatch)
  • Database Systems: PostgreSQL, MongoDB
  • Geospatial Tools: Pyproj, Shapely
  • Workflow Orchestration: Apache Airflow
  • Machine Learning & Deep Learning: scikit-learn, TensorFlow, PyTorch, OpenCV
  • Alerting & Monitoring: AWS Lambda, SNS, Slack Webhooks

📫 Let's Connect

Thank you for stopping by my profile! I'm always excited to connect, collaborate, and explore innovative solutions in data engineering and machine learning.

Popular repositories Loading

  1. CSCE-5310 CSCE-5310 Public

    Jupyter Notebook 1

  2. CS5218 CS5218 Public

    Jupyter Notebook

  3. solr solr Public

    Forked from apache/solr

    Apache Solr open-source search software

    Java

  4. NLP NLP Public

    Jupyter Notebook

  5. CSCE5222_Image-De-noising-Using-Deep-Learning CSCE5222_Image-De-noising-Using-Deep-Learning Public

    Jupyter Notebook

  6. Spam-Detection Spam-Detection Public

    Jupyter Notebook

0