8000 ved1beta (वेदांत) · GitHub

More Web Proxy on the site http://driver.im/

ved1beta

Follow

🍊

santra

वेदांत ved1beta

🍊

santra

Follow

Hi, I'm Ved! 🎖️ GPU Engineer : )

32 followers · 127 following

Sponsors

Achievements

Achievements

Highlights

Developer Program Member
Pro

ved1beta/README.md

Things I Do: )

Triton: making custom triton kernels for better optimizations, working on some big kernel projects
Cuda: cuda architecture for better understanding of kernels and triton
Deep Learning: comp vision, NLP etc. : )

Technical Skills 🛠️

Languages: Python, CUDA, C++
Frameworks & Libraries: Pytorch, Pandas, Matplotlib, triton, Mpi4py
Tools & Platforms: GitHub, Docker, Vercel, Neovim, Vscode, Jupyter Notebook, Aws
Machine Learning Specialist: Proficient in statistical analysis, predictive modeling (Regression, Decision Trees, Random Forest), and advanced algorithms (CatBoost, SGD) with strong focus on optimization and accuracy.

Key Projects 📚

CUDA

GPU Sanghathan: Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
Cuda writer: writing cuda kernels from scratch vec_add to flash_attention and model implementation from scratch.
Flash attention: Implementation of flash attention in tritonutilization

Machine learning

Paligemma-Google: Implemented paligemma vision language model by google from scratch paper
Transformer: Implemented Transformer language model by Google from scratch paper
Mixture of Experts: Mixture of Experts (MoE) model with a focus on efficient routing and expert
Triton/CUDA kernels in my free time : )

Connect with Me 📬

🐦 Twitter
📫 Email
🔗 LinkedIn I'm looking forward to collaborating on projects that are at the intersection of technology and social good. Let's connect! 🌍

Pinned Loading

intel/neural-compressor intel/neural-compressor Public

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python 2.4k 265
vllm-project/llm-compressor vllm-project/llm-compressor Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1.3k 123
bitsandbytes-foundation/bitsandbytes bitsandbytes-foundation/bitsandbytes Public

Accessible large language models via k-bit quantization for PyTorch.

Python 7k 691
GPU-sanghathan GPU-sanghathan Public

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Python 3
Paligemma Paligemma Public

vision language model

Python 2
Cuda_writer Cuda_writer Public

Distributed training

Jupyter Notebook 1

0