- Kelowna, Canada
- @rsalsa
Starred repositories
An extensible, state of the art columnar file format. Formerly at @spiraldb, now part of the Linux Foundation.
Fully open reproduction of DeepSeek-R1
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Postgres Data Warehouse, built on Iceberg
ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture
High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka/Pulsar, Iceberg, or ClickHouse, and seamlessly write results back. Supports powerful featβ¦
A cloud native embedded storage engine built on object storage.
A portable accelerated data query and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
This repository contains sample code demonstrating various use cases leveraging Amazon Bedrock and Generative AI. Each sample is a separate project with its own directory, and includes a basic Streβ¦
π€ Chat with your SQL database π. Accurate Text-to-SQL Generation via LLMs using RAG π.
An extremely fast Python package and project manager, written in Rust.
the AI-native open-source embedding database
Awesome-LLM: a curated list of Large Language Model
Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
An in-process Parquet merge engine for better data warehousing in S3 with MVCC
An analytics database that puts JSON and relational tables on equal footing
π¦π Build context-aware reasoning applications π¦π
AI's query engine - Platform for building AI that can answer questions over large scale federated data. - The only MCP Server you'll ever need
Custom AI assistant platform to speed up your work.
π’ Open-Source Evaluation & Testing for AI & LLM systems
π¦ A curated list of awesome DuckDB resources
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Duβ¦
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
A cluster computing framework for processing large-scale geospatial data
A collection of enhancements for UnifiOS based devices