The Kimball API is a self-hosted analytics service that takes events from your application and generates predictions about user behaviors.
Talk to us about how we can help with your analytics/prediction problems!
See our documentation about how to install and use the Kimball API. The remainder of this file contains information related to developing the Kimball API itself.
Use nix!
The shell.nix
file contains the required development tools. Use nix-shell
to get started
make deps app shell
make live-js
export TAG=kimball
docker build -t $TAG .
docker run -v ${PWD}/config/example.config:/kimball/app.config $(PWD)/config/example.config:/kimball/app.config -it -e LOG_LEVEL=debug $TAG
An Erlang configuration file will be loaded from /kimball/app.config
as part of the Release/Dockerfile. A user configuration file can be placed there to provide more complex configuration described below.
Information can be included about the deployment to help in future diagnosis of problems. Right now this is the "site" and "cluster" information.
[{features, [
{site, "Site name"},
{cluster, "Cluster name"}
]}].
The initial bloom filter type, size, and error probability can be configured for counters. This is configured as a list with a regular expression matching the name of the filter.
pattern
- The regular expressiontype
-bloom_fixed_size
orbloom_scalable
date_cohort
-weekly
or omitted. Whether to automatically generate counters for each week to track events over time.size
- The fixed size or initial size (forbloom_scalable
filters)error_probability
- Bloom filter error probability
Example
[{features, [
{counters, #{
init => [
#{pattern => ".*",
type => bloom_fixed_size,
date_cohort => weekly,
size => 10000,
error_probability => 0.01}
]
}}
]}].
The API will default to requiring authentication. Options to configure
api_auth
- either enable
or disable
(default enable
)
api_auth_tokens
- A list of binaries that are acceptable for authorization. These should be provided in an authorization header of the form:
Authorization: Bearer $TOKEN
Example
[{features, [
{api_auth, disable}
]}].
or
[{features, [
{api_auth, enable},
{api_auth_tokens, [<<"SECRET_TOKEN">>]}
]}].
The application can forward event streams via GRPC to external services. This is currently experimental, not well tested, and may incur significant performance concerns.
Proto file available at src/proto/features_proto.proto
.
Configure such as
[{features, [
{external_grpc_event_targets, [{"127.0.0.1", 8079}]}
]}].
API requests for the prediction API can also ask an external service for predictions and include them in the Kimball API response.
Proto file available at src/proto/features_proto.proto
.
Configure such as
[{features, [
{external_grpc_prediction_targets, [{"service name", "127.0.0.1", 8079}]}
]}].
ADDITIONAL_NAMESPACES
- A comma separated list of namespaces to sync feature config to. This should include any namespaces where you intend to run sidecarsANALYTICS_HOST
- Where to forward analytic events to if this process isn't storing them directly. This is used by the sidecar mode to know where to forward to an api-mode process.API_PORT
- (default8080
) Port where the HTTP API will be available.AWS_ACCESS_KEY_ID
- Credentials for interacting with AWSAWS_SECRET_ACCESS_KEY
- Credentials for interacting with AWSFEATURES_MODE
- Which mode to start the application inapi
(default) - Fully feature API server, storing state in configmapssidecar
- Read only API meant to be deployed as a sidecar. Features features from/features/data
volume in Kubernetes.
KUBERNETES_MEMORY_LIMIT
- The container limit in bytes, used for computing the metricmemory_remaining_bytes
NAMESPACE
- Namespace to use for reading/writing in KubernetesS3_BUCKET
- AWS S3 Bucket to use for storageS3_HOST
- AWS S3 Host to use for storage. This will attempt to auto configure when running in AWS.GCS_BUCKET
- Google Cloud Storage Bucket to use for storageGOOGLE_APPLICATION_CREDENTIALS
- Path to a JSON Service Account KeySTORAGE_PATH_PREFIX
- Path prefix to use when storing files in S3/GCS. Defaults to the installation namespace.
More advanced/less-likely used configuration can be done via additional app.config
options
counter_startup_delay
- int - Milliseconds between starting each counter when the application begins/resets. Used to rate limit things to prevent thundering herds as the application starts.
[{features, [
{counter_startup_delay, 1}
]}].
System metrics are available at /metrics
Some important ones:
kimball_counters
- The number of counters registered with the router. Equivalent to the number of events tracked.kimball_persist_counters_managed
- The number of counters the persistence manager triggered in the last run. Should track, but lag,kimball_counters
.
Metrics for each counter are available at /metrics/counters
-
kimball_counter
- Event counters -
kimball_counter_weekly
- Per week counters ifdate_cohort => weekly
is set.
Metrics for goal/event predictions are available at /metrics/predictions
kimball_bayes_prediction
- Prediction that users who completeevent
label will completed thegoal
label
git checkout trunk
git pull
TAG=$(date +"%Y.%m.%d")
git tag ${TAG}
git push origin ${TAG}
- Find outdated dependencies with
npm outdated
and update in `package.json - Run
npm update
... I think
Apache 2.0. Copyright Get Kimball Inc. 2020