I'm Sai Manohar Vemuri, a Ph.D. student and AI researcher with a strong passion for cutting-edge research in deep learning, neural architecture search (NAS), and edge AI optimization. My research is deeply focused on developing advanced techniques for object detection, semantic segmentation, and sensor fusion in autonomous systems, while also emphasizing efficient deployment on a variety of hardware platforms such as FPGAs, GPUs, and embedded devices.
โOptimizing intelligence โ from Voxel Grids to Silicon Gates.โ
- Object Detection: Implementing and customizing advanced detection pipelines using YOLOv4/v8, EfficientDet, and Faster R-CNN, fine-tuned for high-throughput scenarios in autonomous driving.
- Semantic & Instance Segmentation: Leveraging Mask R-CNN, DeepLabV3+, and SegFormer with custom loss functions and domain-specific datasets (e.g., pothole detection, RSCD).
- Generative AI & Augmentation:
- Using GANs (CycleGAN, StyleGAN2) for domain translation and dataset augmentation.
- Synthetic data generation pipelines to improve model robustness under rare edge cases.
- Transformer-Based Architectures:
- 3D Voxel Transformers for LiDAR and point cloud understanding.
- Custom spatial attention modules for handling sparse 3D data in perception stacks.
- Multi-Modal Fusion:
- Late fusion with feature concatenation from LiDAR + RGB camera modalities.
- DepthBoost + Uncertainty-Aware Fusion for enhancing LiDAR-sparse regions.
- Point Cloud Learning:
- Deep learning on voxelized LiDAR using VoxelNet, PointPillars, Point Transformer, and my custom VoxelPothNet.
- Fusing multiple frames for temporal consistency in dynamic environments.
- Real-Time 3D Pipeline Integration:
- Building full-stack pipelines in ROS 2 and deploying with CARLA and Autoware.AI for simulation and testing.
- Focusing on optimizing AI models for edge devices like FPGAs, Jetson, and mobile systems to achieve low power and high-efficiency solutions for real-time processing.
- Implementing HW/SW co-design principles to efficiently deploy deep learning models on edge devices, using techniques such as quantization, model pruning, and neural architecture search (NAS).
- Integrating sensor fusion from LiDAR and camera data to improve the robustness and accuracy of AI models for autonomous systems, while ensuring these models are optimized for low-latency deployment.
-
- NVIDIA Jetson Family (Nano, Xavier NX, AGX Orin)
- NVIDIA DRIVE PX2 & Pegasus
- Xilinx Zynq Ultrascale+ FPGAs (ZCU102/ZCU104)
- ASIC prototypes for low-power model acceleration
- Developing cutting-edge solutions for robust audio watermarking, ensuring the authenticity and traceability of audio signals in dynamic environments.
- Working on real-time detection systems for AI-generated speech, including deepfake detection and addressing the emerging challenges of synthetic voice detection.
- Researching techniques for protecting audio content from unauthorized use, leveraging advanced signal processing and machine learning methods.
- Using Neural Architecture Search (NAS) to discover efficient model architectures tailored for specific tasks and hardware platforms.
- Applying quantization, knowledge distillation, and pruning techniques to reduce model size and improve inference speed, making models suitable for edge devices without compromising on performance.
- Exploring novel methods for optimizing models at both the architectural and hardware level to improve deployment efficiency, especially on low-power devices like FPGAs and embedded systems.
- ๐ Run latency-aware NAS to automatically find optimal model architectures for FPGA/Jetson targets.
- ๐ง Simulate RTL-level designs and integrate model compute graphs with HLS pipelines.
- โก Apply power-saving hardware techniques like clock gating, operand isolation, and dynamic voltage scaling for ASIC modeling.
- ๐ ๏ธ Export models using ONNX, optimize with TensorRT or Vitis AI Compiler, and deploy them on:
- ๐ฅ๏ธ Jetson using DeepStream SDK
- ๐ก๏ธ FPGAs using custom AXI4 IPs and PetaLinux
- ๐ง ASICs through co-simulation flows
- ๐ Validate system-wide KPIs: FPS, latency, throughput, power, and accuracy with real-world testbeds.
Below is a high-level overview of my research workflow, spanning the full AI stack โ from model design to hardware deployment:
Feel free to explore my repositories and projects to see how I tackle real-world problems using these advanced techniques. I am always looking for new challenges and opportunities to collaborate on impactful AI research!