-
University of Warsaw
- Warsaw, Poland
- https://syzymon.github.io
- @s_tworkowski
- https://scholar.google.com/citations?user=1V8AeXYAAAAJ&hl=en
Stars
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
Your pair programming wingman. Supports OpenAI, Anthropic, or any LLM on your local inference server.
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
A framework for the evaluation of autoregressive code generation language models.
General technology for enabling AI capabilities w/ LLMs and MLLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
Long-context pretrained encoder-decoder models
jax-triton contains integrations between JAX and OpenAI Triton
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).