8000 [Bug]: TimescaleDB Insert with Parallel Copy into hypertable is 10x slower compared to inserting into non-hypertables · Issue #8023 · timescale/timescaledb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Bug]: TimescaleDB Insert with Parallel Copy into hypertable is 10x slower compared to inserting into non-hypertables #8023
Open
@georgetanjs

Description

@georgetanjs

What type of bug is this?

Performance issue

What subsystems and features are affected?

Data ingestion

What happened?

Hey guys, I'm bulk loading some orderbook data into TimescaleDB using timescaledb-parallel-copy, however I noticed the performance difference between loading data into a hypertable & non-hypertable is huge.

TimescaleDB version affected

2.17.2

PostgreSQL version used

PostgreSQL 17.4 (Debian 17.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

What operating system did you use?

Debian GNU/Linux 12 (bookworm)

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

timescaledb-parallel-copy version

timescaledb-parallel-copy v0.9.0 (linux amd64)

How can we reproduce the bug?

  1. Create Tables. Run these SQL commands to create the tables.
-- Create the hypertable version
CREATE TABLE book_change_hyper (
    ts TIMESTAMPTZ NOT NULL,
    exchange TEXT NOT NULL,
    market TEXT NOT NULL,
    bids DOUBLE PRECISION[] NOT NULL,
    asks DOUBLE PRECISION[] NOT NULL
);

-- Make it a TimescaleDB hypertable
SELECT create_hypertable('book_change_hyper', 'ts');

-- Set the chunk time interval to 4 hours
SELECT set_chunk_time_interval('book_change_hyper'::regclass, INTERVAL '4 hour');

-- Create the non-hypertable version (regular PostgreSQL table)
CREATE TABLE book_change_regular (
    ts TIMESTAMPTZ NOT NULL,
    exchange TEXT NOT NULL,
    market TEXT NOT NULL,
    bids DOUBLE PRECISION[] NOT NULL,
    asks DOUBLE PRECISION[] NOT NULL
);
  1. Generate dummy data. Run this python script to populate test_data.csv with our orderbook data.

Feel free to adjust the start_date and end_date to populate more data so that the performance degradation will be more visible.

#!/usr/bin/env python3

import csv
import random
import datetime
from datetime import timedelta, timezone

# Configuration
exchange = 'binance'
market = 'BTC/USDT'
output_file = 'test_data.csv'

start_date = datetime.datetime(2019, 1, 1, tzinfo=timezone.utc)
end_date = datetime.datetime(2022, 1, 1, tzinfo=timezone.utc)

# Calculate number of minutes (for progress reporting)
total_minutes = int((end_date - start_date).total_seconds() / 60)
print(f"Generating {total_minutes} rows of test data (one per minute from 2019 to 2025)...")

# Function to generate a random order book entry with flattened arrays
def generate_random_book():
    # Generate 5 levels of bids and asks, but flatten the arrays
    num_levels = 5
    bids_flat = []
    asks_flat = []
    
    # Base price varies by year to simulate price changes over time
    year = current_date.year
    year_factor = (year - 2019) * 5000  # Price increases each year
    base_price = 10000 + year_factor + random.uniform(-500, 500)
    
    for i in range(num_levels):
        # Bids (price, quantity) - flattened
        bid_price = base_price - (i * 10) - random.uniform(0, 5)
        bid_qty = random.uniform(0.1, 2.0)
        bids_flat.append(round(bid_price, 2))
        bids_flat.append(round(bid_qty, 8))
        
        # Asks (price, quantity) - flattened
        ask_price = base_price + (i * 10) + random.uniform(0, 5)
        ask_qty = random.uniform(0.1, 2.0)
        asks_flat.append(round(ask_price, 2))
        asks_flat.append(round(ask_qty, 8))
    
    # Format as PostgreSQL array literals
    bids_str = "{" + ",".join(str(val) for val in bids_flat) + "}"
    asks_str = "{" + ",".join(str(val) for val in asks_flat) + "}"
    
    return bids_str, asks_str

# Generate data
with open(output_file, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile, delimiter='|')
    
    current_date = start_date
    counter = 0
    
    while current_date < end_date:
        if counter % 100000 == 0:
            print(f"Generated {counter} rows... ({round(counter/total_minutes*100, 2)}%)")
        
        # Generate timestamp
        ts_str = current_date.isoformat()
        
        # Generate book data
        bids_str, asks_str = generate_random_book()
        
        # Write row
        writer.writerow([ts_str, exchange, market, bids_str, asks_str])
        
        # Move to next minute
        current_date += timedelta(minutes=1)
        counter += 1

print(f"Test data generated and saved to {output_file}")
  1. Import the data from test_data.csv into DB using timescaledb-parallel-copy.
#!/bin/bash

# Database connection parameters - modify these values for your environment
PGDB="your_database"
PGHOST="localhost" 
PGUSER="postgres"
PGPASSWORD="your_password"
PGPORT="5432"
SCHEMA="public"

# Test regular table first
echo "=== TESTING REGULAR TABLE ==="
echo "Starting regular table insert: $(date)"
cat test_data.csv | timescaledb-parallel-copy \
    -verbose \
    -reporting-period 5s \
    -schema "$SCHEMA" \
    -table "book_change_regular" \
    -connection "dbname=$PGDB host=$PGHOST user=$PGUSER password=$PGPASSWORD port=$PGPORT" \
    --workers 10 \
    -batch-size 50000 \
    -copy-options "" \
    -split "|"
echo "Finished regular table insert: $(date)"

# Test hypertable second
echo "=== TESTING HYPERTABLE ==="
echo "Starting hypertable insert: $(date)"
cat test_data.csv | timescaledb-parallel-copy \
    -verbose \
    -reporting-period 5s \
    -schema "$SCHEMA" \
    -table "book_change_hyper" \
    -connection "dbname=$PGDB host=$PGHOST user=$PGUSER password=$PGPASSWORD port=$PGPORT" \
    --workers 10 \
    -batch-size 50000 \
    -copy-options "" \
    -split "|"
echo "Finished hypertable insert: $(date)"

echo "Tests completed."


Results

=== TESTING REGULAR TABLE ===
Starting regular table insert: Tue Apr 29 04:39:38 UTC 2025
2025/04/29 04:39:38 There will be reports every 5s
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:38 Copy command: COPY "public"."book_change_regular" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 total rows 1578240
2025/04/29 04:39:39 (1s), row rate 1284828.51/sec (period), row rate 1285061.58/sec (overall), 1,578,240 total rows
COPY 1578240, took 1.228348823s with 10 worker(s) (mean rate 1284846.755619/sec)
Finished regular table insert: Tue Apr 29 04:39:39 UTC 2025
=== TESTING HYPERTABLE ===
Starting hypertable insert: Tue Apr 29 04:39:39 UTC 2025
2025/04/29 04:39:39 There will be reports every 5s
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:39 Copy command: COPY "public"."book_change_hyper" FROM STDIN WITH DELIMITER '|'
2025/04/29 04:39:44 (5s), row rate 9996.73/sec (period), row rate 9998.11/sec (overall), 50,000 total rows
2025/04/29 04:39:46 total rows 1578240
2025/04/29 04:39:49 (10s), row rate 10000.00/sec (period), row rate 9999.05/sec (overall), 100,000 total rows
2025/04/29 04:39:54 (15s), row rate 10000.00/sec (period), row rate 9999.37/sec (overall), 150,000 total rows
2025/04/29 04:39:59 (20s), row rate 19999.99/sec (period), row rate 12499.41/sec (overall), 250,000 total rows
2025/04/29 04:40:04 (25s), row rate 10000.00/sec (period), row rate 11999.55/sec (overall), 300,000 total rows
2025/04/29 04:40:09 (30s), row rate 10000.00/sec (period), row rate 11666.30/sec (overall), 350,000 total rows
2025/04/29 04:40:14 (35s), row rate 10000.00/sec (period), row rate 11428.26/sec (overall), 400,000 total rows
2025/04/29 04:40:19 (40s), row rate 10000.00/sec (period), row rate 11249.73/sec (overall), 450,000 total rows
2025/04/29 04:40:24 (45s), row rate 20000.00/sec (period), row rate 12221.97/sec (overall), 550,000 total rows

We can see that inserting into regular tables are super fast, while inserting into hypertables are much much slower.

In reality, our data is much larger, and from what I see, inserting into non-hypertables is like 10-20k rows /sec, while inserting into hypertables is 3-4k rows /sec.

It makes sense that hypertables would have slower inserts, but this is like 10x performance reduction.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0