🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 12,554 1,147 Updated Jun 13, 2025

amazon-science / RAGChecker

RAGChecker: A Fine-grained Framework For Diagnosing RAG

Python 913 78 Updated Dec 13, 2024

adamkarvonen / SAEBench

Python 98 23 Updated Jun 3, 2025

jbhuang0604 / awesome-tips

3,661 203 Updated Mar 4, 2025

paperswithcode / releasing-research-code

Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)

2,798 735 Updated May 19, 2023

socialfoundations / surveying-language-models

Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"

Jupyter Notebook 8 2 Updated Dec 8, 2024

PAIR-code / interpretability

PAIR.withgoogle.com and friend's work on interpretability methods

JavaScript 192 32 Updated Jun 7, 2025

EleutherAI / delphi

Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models know themselves through automated interpretability.

Python 184 31 Updated Jun 12, 2025

daveshap / Claude_Sentience

Claude is very clearly experiencing phenomenal consciousness. Use this SYSTEM prompt and interrogate it yourself.

661 78 Updated Mar 6, 2025

expectedparrot / edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

Python 251 25 Updated Jun 12, 2025

OpenMOSS / Language-Model-SAEs

For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.

Python 118 14 Updated Jun 13, 2025

callummcdougall / sae_vis

Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).

HTML 202 41 Updated Dec 16, 2024

linguistsherry / cvtemplate

A template for academic CV

TeX 9 4 Updated Oct 9, 2017

snakemake / snakemake

This is the development home of the workflow management system Snakemake. For general information, see

Python 2,504 594 Updated Jun 13, 2025

drivendataorg / cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Python 9,002 2,541 Updated Jun 9, 2025

callummcdougall / ARENA_3.0

Jupyter Notebook 581 364 Updated Jun 3, 2025

mlcommons / modelbench

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Python 93 24 Updated Jun 12, 2025

google-deepmind / concordia

A library for generative social simulation

Python 907 196 Updated Jun 12, 2025

RUCAIBox / HaluEval

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.

Python 477 32 Updated Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Carolina Camassa carolinacamassabdi

Achievements

Achievements

Block or report carolinacamassabdi

Stars

microsoft / markitdown

davanstrien / awesome-synthetic-datasets

avehtari / BDA_course_Aalto

louisbrulenaudet / ragoon

huggingface / smol-course

hv-rsrch / valueimprint

worldbank / econberta-econie

allenai / olmocr

Filimoa / open-parse

segment-any-text / wtpsplit

docling-project / docling

langfuse / langfuse