-
Harvard University
- Boston, MA
-
15:39
(UTC -04:00)
Stars
A computer algebra system written in pure Python
ML Benchmarks in Algebraic Combinatorics
Petehr โ Your Personal Assistant for Creating Research-Ready EHR Data. A Python Toolkit for EHR Processing.
A flexible and efficient training framework for large-scale alignment tasks
An Open-Source Package for Knowledge Embedding (KE)
Clinical Histopathology Imaging Evaluation Foundation Model
A modular graph-based Retrieval-Augmented Generation (RAG) system
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
huybery / OpenDevin
Forked from All-Hands-AI/OpenHands๐ OpenDevin: Code Less, Make More
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Yuanhy1997 / ehrdiff
Forked from sczzz3/EHRDiffEHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models [TMLR]
Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]
๐ OpenHands: Code Less, Make More
Dataset and modelling infrastructure for modelling "event streams": sequences of continuous time, multivariate events with complex internal dependencies.
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
[NeurIPS 2023] Official codes of "MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data"
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment".