8000 GitHub - nineinchnick/trino-faker at refs/heads/main
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Oct 31, 2024. It is now read-only.

nineinchnick/trino-faker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trino Connector

Faker connector is now part of Trino, since 464

This is a Trino connector that generates random data. It has two goals:

  1. Be easy to use.
  2. Support most of Trino's data types.
  3. Generate random data that looks as real as possible and is correct, that is it matches all the constraints.

Quick Start

To run a Docker container with the connector, run the following:

docker run \
  -d \
  --name trino-faker \
  -p 8080:8080 \
  nineinchnick/trino-faker:0.61

Then use your favorite SQL client to connect to Trino running at http://localhost:8080

Try creating a table that looks like an existing table in a real database and insert some random data back into it:

CREATE TABLE faker.default.customer (LIKE production.public.customer EXCLUDING PROPERTIES);
INSERT INTO production.public.customers
SELECT *
FROM faker.default.customers
WHERE 1=1
AND born_at BETWEEN CURRENT_DATE - INTERVAL '150' YEAR AND CURRENT_DATE
AND age_years BETWEEN 0 AND 150
LIMIT 100;

To generate more realistic data, choose specific generators by setting the generator property on columns:

SHOW CREATE TABLE production.public.customers;
-- copy the output of the above query and add some properties:
CREATE TABLE faker.default.customer (
  id UUID NOT NULL,
  name VARCHAR NOT NULL WITH (generator = '#{Name.first_name} #{Name.last_name}'), 
  address VARCHAR NOT NULL WITH (generator = '#{Address.fullAddress}'),
  born_at DATE,
  age_years INTEGER
);

See the Datafaker's documentation for more information about the expression syntax and available providers.

Usage

Download one of the ZIP packages, unzip it, and copy the trino-faker-0.61 directory to the plugin directory on every node in your Trino cluster. Create a faker.properties file in your Trino catalog directory and set all the required properties.

connector.name=faker

After reloading Trino, you should be able to connect to the faker catalog.

Generators

A particular data generator is selected based on the column type.

For CHAR, VARCHAR, and VARBINARY column, the default generator uses the Lorem ipsum placeholder text. Unbounded columns will have a random sentence with 3 to 40 words.

To have more control over the format of the generated data, use the generator column property. Some examples of valid generator expressions:

  • #{regexify '(a|b){2,3}'}
  • #{regexify '\\.\\*\\?\\+'}
  • #{bothify '????','false'}
  • #{Name.first_name} #{Name.first_name} #{Name.last_name}
  • #{number.number_between '1','10'}

Generator expressions cannot be used for non-character-based columns. To limit their data range, specify constraints in the WHERE clause.

Number of generated rows

To control how many rows are generated for a table, use the LIMIT clause in the query. A default limit can be set using the default_limit table, or schema property or in the connector configuration file.

Null values

For columns without the NOT NULL constraint, null values will be generated using the default probability of 50% (0.5). It can be modified using the null_probability property set for a column, table, or schema. The default value of 0.5 can be also modified in the connector configuration file.

Build

Run all the unit test classes.

mvn test

Creates a deployable jar file

mvn clean compile package

Copy jar files in the target directory to use the connector in your Trino cluster.

cp -p target/*.jar ${PLUGIN_DIRECTORY}/faker/

Deploy

An example command to run the Trino server with the faker plugin and catalog enabled:

src=$(git rev-parse --show-toplevel)
docker run \
  -v $src/target/trino-faker-0.61-SNAPSHOT:/usr/lib/trino/plugin/faker \
  -v $src/catalog:/usr/lib/trino/default/etc/catalog \
  -p 8080:8080 \
  --name trino \
  -d \
  trinodb/trino:461

Connect to that server using:

docker run -it --rm --link trino trinodb/trino:461 trino --server trino:8080 --catalog faker --schema default

About

Trino plugin that generates fake data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages

0