Faker connector is now part of Trino, since 464
This is a Trino connector that generates random data. It has two goals:
- Be easy to use.
- Support most of Trino's data types.
- Generate random data that looks as real as possible and is correct, that is it matches all the constraints.
To run a Docker container with the connector, run the following:
docker run \
-d \
--name trino-faker \
-p 8080:8080 \
nineinchnick/trino-faker:0.61
Then use your favorite SQL client to connect to Trino running at http://localhost:8080
Try creating a table that looks like an existing table in a real database and insert some random data back into it:
CREATE TABLE faker.default.customer (LIKE production.public.customer EXCLUDING PROPERTIES);
INSERT INTO production.public.customers
SELECT *
FROM faker.default.customers
WHERE 1=1
AND born_at BETWEEN CURRENT_DATE - INTERVAL '150' YEAR AND CURRENT_DATE
AND age_years BETWEEN 0 AND 150
LIMIT 100;
To generate more realistic data, choose specific generators by setting the generator
property on columns:
SHOW CREATE TABLE production.public.customers;
-- copy the output of the above query and add some properties:
CREATE TABLE faker.default.customer (
id UUID NOT NULL,
name VARCHAR NOT NULL WITH (generator = '#{Name.first_name} #{Name.last_name}'),
address VARCHAR NOT NULL WITH (generator = '#{Address.fullAddress}'),
born_at DATE,
age_years INTEGER
);
See the Datafaker's documentation for more information about the expression syntax and available providers.
Download one of the ZIP packages, unzip it, and copy the trino-faker-0.61
directory to the plugin directory on every node in your Trino cluster.
Create a faker.properties
file in your Trino catalog directory and set all the required properties.
connector.name=faker
After reloading Trino, you should be able to connect to the faker
catalog.
A particular data generator is selected based on the column type.
For CHAR
, VARCHAR
, and VARBINARY
column, the default generator uses the Lorem ipsum
placeholder text.
Unbounded columns will have a random sentence with 3 to 40 words.
To have more control over the format of the generated data, use the generator
column property. Some examples of valid generator expressions:
#{regexify '(a|b){2,3}'}
#{regexify '\\.\\*\\?\\+'}
#{bothify '????','false'}
#{Name.first_name} #{Name.first_name} #{Name.last_name}
#{number.number_between '1','10'}
Generator expressions cannot be used for non-character-based columns. To limit their data range, specify constraints in the WHERE
clause.
To control how many rows are generated for a table, use the LIMIT
clause in the query.
A default limit can be set using the default_limit
table, or schema property or in the connector configuration file.
For columns without the NOT NULL
constraint, null values will be generated using the default probability of 50% (0.5).
It can be modified using the null_probability
property set for a column, table, or schema.
The default value of 0.5 can be also modified in the connector configuration file.
Run all the unit test classes.
mvn test
Creates a deployable jar file
mvn clean compile package
Copy jar files in the target directory to use the connector in your Trino cluster.
cp -p target/*.jar ${PLUGIN_DIRECTORY}/faker/
An example command to run the Trino server with the faker plugin and catalog enabled:
src=$(git rev-parse --show-toplevel)
docker run \
-v $src/target/trino-faker-0.61-SNAPSHOT:/usr/lib/trino/plugin/faker \
-v $src/catalog:/usr/lib/trino/default/etc/catalog \
-p 8080:8080 \
--name trino \
-d \
trinodb/trino:461
Connect to that server using:
docker run -it --rm --link trino trinodb/trino:461 trino --server trino:8080 --catalog faker --schema default