Factor House Local v1.0: A Suite of Modern Data Platforms
We are excited to announce the latest updates to Factor House Local, a collection of pre-configured Docker Compose environments designed to showcase modern data platform setups. This release enhances our local development stacks, providing robust, ready-to-use environments for a variety of data-intensive applications.
Highlights
-
Comprehensive Kafka Development Stack with Kpow
This environment provides a complete Apache Kafka ecosystem for development and monitoring. It is built with a high-availability 3-node Kafka cluster, Zookeeper for coordination, Confluent Schema Registry for data governance, and Kafka Connect for data integration. The stack is enhanced with Kpow, an enterprise-grade UI for comprehensive monitoring, data inspection, and management of your Kafka resources, making it ideal for developing and testing event-driven architectures and microservices. -
Real-Time Stream Analytics with Flink & Flex
Centered on Apache Flink, this stack delivers a high-performance solution for streaming analytics. It is tailored for low-latency processing, complex event enrichment, and SQL-driven operations. The environment includes a Flink JobManager, multiple TaskManagers, and a SQL Gateway for interactive queries. It is managed by Flex, an enterprise-grade tool for Flink that provides robust RBAC, a data-oriented UI, and simplified management, perfect for operational intelligence, advanced fraud detection, and real-time metric pipelines. -
Modern Analytics & Lakehouse with Spark and Iceberg
This stack provides a self-contained environment for building and querying data lakehouses. It combines the power of Apache Spark with Apache Iceberg for transactional data management on an open-table format. Data is stored in MinIO, an S3-compatible object storage layer, and a PostgreSQL database is included for relational data workloads. This architecture is ideal for batch ETL/ELT pipelines, interactive data exploration via the included Jupyter Notebook server, and building reliable, scalable analytics pipelines with ACID transactions, schema evolution, and time-travel capabilities. -
Apache Pinot Real-Time OLAP Cluster
Deploy a distributed Apache Pinot cluster, a real-time OLAP (Online Analytical Processing) datastore designed for ultra-low-latency analytics at scale. The stack includes the core Pinot components—Controller, Broker, and Server—providing the foundation to ingest data from streaming sources like Kafka and make it available for analytical queries with millisecond response times. It is optimized for user-facing analytics, real-time dashboards, and anomaly detection. -
New Custom Flink Docker Image
We are introducingfactorhouse/flink
, a custom, multi-architecture (amd64, arm64) Docker image based on the Apache Flink LTS release. This image is optimized for running Flink SQL and PyFlink jobs and comes with out-of-the-box support for S3, Hadoop, Apache Iceberg, and Parquet. It features a unique custom dependency loading mechanism that simplifies using the Flink SQL Client and Gateway by automatically adding pre-packaged JARs to the classpath, significantly streamlining the development of complex data pipelines.