Description
The checks for enforce_home_region
are currently performed at execbuild-time, when the system begins to build a physical plan from a logical plan. This design is flawed because some KV spans accessed during query execution cannot be known at execbuild-time, e.g., it is only possible during execution to know if a lookup-join with non-constant input will access remote regions. As a result, false-positives are possible, i.e., queries which would not access data in remote regions will incorrectly result in a "query has no home region" error.
Here's a simple example in a multi-region logictest:
# LogicTest: multiregion-9node-3region-3azs !metamorphic-batch-sizes
statement ok
CREATE DATABASE multi_region_test_db PRIMARY REGION "ca-central-1" REGIONS "ap-southeast-2", "us-east-1" SURVIVE ZONE FAILURE
statement ok
USE multi_region_test_db
sleep 5s
statement ok
CREATE TABLE t (
crdb_region crdb_internal_region,
k INT PRIMARY KEY,
v STRING
) LOCALITY REGIONAL BY ROW
statement ok
INSERT INTO t (crdb_region, k, v) VALUES ('ap-southeast-2', 5, 'five'), ('us-east-1', 6, 'six')
statement ok
SET enforce_home_region = true
query T
SELECT gateway_region();
----
ap-southeast-2
# The test fails here with the error: "Query has no home region. Try adding a
# filter on t.crdb_region and/or on key column (t.k)".
statement ok
SELECT t.v FROM (VALUES ('ap-southeast-2', 5)) AS v(r, k)
INNER LOOKUP JOIN t ON v.r::crdb_internal_region = t.crdb_region AND v.k = t.k
The query plan looks like:
project
├── columns: v:5
├── cardinality: [0 - 1]
├── immutable
├── stats: [rows=0.333333]
├── cost: 1e+100
├── cost-flags: huge-cost-penalty
├── key: ()
├── fd: ()-->(5)
├── distribution: ap-southeast-2
├── prune: (5)
└── inner-join (lookup t)
├── columns: column2:2 crdb_region:3 k:4 v:5 column10:10
├── flags: force lookup join (into right side)
├── key columns: [10 2] = [3 4]
├── lookup columns are key
├── cardinality: [0 - 1]
├── immutable
├── stats: [rows=0.333333, distinct(2)=0.333333, null(2)=0, distinct(3)=0.333333, null(3)=0, distinct(4)=0.333333, null(4)=0, distinct(10)=0.333333, null(10)=0]
├── cost: 1e+100
├── cost-flags: huge-cost-penalty
├── key: ()
├── fd: ()-->(2-5,10), (2)==(4), (4)==(2), (3)==(10), (10)==(3)
├── distribution: ap-southeast-2
├── lookup table distribution: ap-southeast-2,ca-central-1,us-east-1
├── values
│ ├── columns: column2:2 column10:10
│ ├── cardinality: [1 - 1]
│ ├── stats: [rows=1, distinct(2)=1, null(2)=0, distinct(10)=1, null(10)=0]
│ ├── cost: 0.02
│ ├── key: ()
│ ├── fd: ()-->(2,10)
│ ├── distribution: ap-southeast-2
│ ├── prune: (2,10)
│ └── (5, 'ap-southeast-2')
└── filters (true)
Notice how the distribution of the lookup join includes all regions. Technically, in this case it is possible to determine at execbuild-time that the distribution of the lookup-join is solely ap-southeast-2
, but a slight tweak of the query to change the Values expression into a Scan would make it impossible.
We should create a utility for detecting remote region access that can be created during optimization/execbuild-time and used during execution-time. The utility would be created for each operator that performs KV operations. These operators would pass all KV read spans through the utility during execution to ensure that they only access the home region.
Jira issue: CRDB-51582
Metadata
Metadata
Assignees
Labels
Type
Projects
Status