-
Tsinghua University
- Beijing
-
22:03
(UTC +08:00) - http://yujia-qin.github.io/
- https://twitter.com/
Highlights
- Pro
Stars
c/ua is the Docker Container for Computer-Use AI Agents.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
A series of technical report on Slow Thinking with LLM
verl: Volcano Engine Reinforcement Learning for LLMs
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton
Utilities intended for use with Llama models.
Agentic components of the Llama Stack APIs
Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
Code examples and resources for DBRX, a large language model developed by Databricks
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K question-answer pairs collected by human annotators for ~35K…
Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
A keyboard shortcut browser extension for keyboard-based navigation and tab operations with an advanced omnibar
🦜🔗 Build context-aware reasoning applications
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models