skykongkong8

Sungsik Kong skykongkong8

26 followers · 29 following

@Samsung
Seoul, Republic of Korea

Achievements

Highlights

Developer Program Member

Organizations

Stars

bshoshany / thread-pool

BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library

C++ 2,590 285 Updated Dec 20, 2024

ARM-software / kleidiai

This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai

C 56 3 Updated Jul 9, 2025

mmperf / mmperf

MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.

C++ 133 31 Updated Sep 25, 2023

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 52,079 8,662 Updated Jul 13, 2025

XiaoMi / mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 5,012 825 Updated Jun 17, 2024

UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices

C++ 949 115 Updated Jun 14, 2025

SylphAI-Inc / LLM-engineer-handbook

A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.

3,475 445 Updated May 23, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 20,500 1,536 Updated Jun 3, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,548 2,295 Updated Aug 6, 2024

gperftools / gperftools

Main gperftools repository

C++ 8,750 1,524 Updated Jun 5, 2025

ggml-org / ggml

Tensor library for machine learning

C++ 12,814 1,282 Updated Jul 12, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 82,932 12,320 Updated Jul 12, 2025

Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.

Python 22,782 1,832 Updated Jul 4, 2025

sunkusun9 / DS_Lv3_elearning

Jupyter Notebook 37 30 Updated May 13, 2025

mratsim / laser

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for float…

Nim 286 14 Updated Jan 4, 2024

mozilla / FBGEMM

INACTIVE - http://mzl.la/ghe-archive - FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 3 Updated Mar 18, 2020

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 437 209 Updated Jul 11, 2025

PX4 / eigen

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.

C++ 700 148 Updated Oct 18, 2023

OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 6,859 1,572 Updated Jul 11, 2025

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 871 89 Updated Jul 4, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,994 1,578 Updated Jul 12, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,129 3,121 Updated Jun 26, 2025

microsoft / generative-ai-for-beginners

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Jupyter Notebook 91,728 46,942 Updated Jul 7, 2025

real-stanford / universal_manipulation_interface

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Python 941 180 Updated Dec 18, 2024

tpoisonooo / how-to-optimize-gemm

row-major matmul optimization

C++ 646 89 Updated Sep 9, 2023

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 7,215 720 Updated Jul 8, 2025

google / gemmlowp

Low-precision matrix multiplication

C++ 1,810 456 Updated Jan 29, 2024

kunpengcompute / AvxToNeon

Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.

C 123 45 Updated Jan 13, 2024

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,063 431 Updated Jul 11, 2025

HazyResearch / blocking-tutorial

C++ 134 18 Updated May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sungsik Kong skykongkong8

Achievements

Achievements

Highlights

Organizations

Block or report skykongkong8

Stars

bshoshany / thread-pool

ARM-software / kleidiai

mmperf / mmperf

vllm-project / vllm

XiaoMi / mace

UbiquitousLearning / mllm

SylphAI-Inc / LLM-engineer-handbook

microsoft / BitNet

karpathy / llama2.c

gperftools / gperftools

ggml-org / ggml

ggml-org / llama.cpp

Cinnamon / kotaemon

sunkusun9 / DS_Lv3_elearning

mratsim / laser

mozilla / FBGEMM

ROCm / composable_kernel

PX4 / eigen

OpenMathLib / OpenBLAS

mit-han-lab / TinyChatEngine

NVIDIA / TensorRT-LLM

karpathy / llm.c

microsoft / generative-ai-for-beginners

real-stanford / universal_manipulation_interface

tpoisonooo / how-to-optimize-gemm

bitsandbytes-foundation / bitsandbytes

google / gemmlowp

kunpengcompute / AvxToNeon

google / XNNPACK

HazyResearch / blocking-tutorial