Stars
a c++/cuda template library for tensor lazy evaluation
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python
An unofficial cuda assembler, for all generations of SASS, hopefully :)
北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
a computing kernel implementation in ML inference framework aiming at theoretical limit
Structural implementation of RL key algorithms
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
🧪 single header unit testing framework for C and C++
MFixedPoint is a header-only fixed-point C++ library suitable for fast arithmetic operations on systems which don't have a FPU (e.g. embedded systems).. Suitable for performing computationally inte…
M^3SNet: Unsupervised Multi-metric Multi-view Stereo Network
Small, portable implementation of the C11 threads API
A code generator for array-based code on CPUs and GPUs
High-efficiency floating-point neural network inference operators for mobile, server, and Web
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX
A template for modern C++ projects using CMake, Clang-Format, CI, unit testing and more, with support for downstream inclusion.
Jump to any definition and references 👁 IDE madness without overhead 🚀