Open
Description
CrateDB version
5.10.7
CrateDB setup information
CrateDB Cloud single-node S4
Problem description
When querying a large result set with vectors of length 1152, there is an OutOfMemoryError
instead of a CircuitBreakingException
.
The issue does not occur when the vector column is defined as a plain ARRAY(DOUBLE)
.
Steps to Reproduce
CREATE TABLE IF NOT EXISTS "doc"."WayveScenes" (
"Vector_ID" TEXT,
"File_Path" TEXT,
"Coordinate" ARRAY(DOUBLE PRECISION),
"Vector" ARRAY(DOUBLE PRECISION)
)
CLUSTERED INTO 4 SHARDS;
CREATE TABLE IF NOT EXISTS "doc"."WayveScenesVectors" (
"Vector_ID" TEXT,
"File_Path" TEXT,
"Coordinate" GEO_POINT,
"Vector" FLOAT_VECTOR(1152)
)
CLUSTERED INTO 4 SHARDS;
I imported the file https://huggingface.co/datasets/quasara-io/WayveScenes/resolve/main/data/Main_1-00000-of-00001.parquet using the Cloud import functionality into both tables.
Actual Result
Querying the table with ARRAY(DOUBLE)
fails with a CircuitBreakingException
, which is expected given that the node is fairly small and there is no LIMIT
clause:
SELECT * FROM "WayveScenes";
org.elasticsearch.common.breaker.CircuitBreakingException: Allocating 2mb for 'parent: http-result' failed, breaker would use 983.9mb in total. Limit is 972.7mb. Either increase memory and limit, change the query or reduce concurrent query load
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:275)
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:129)
at io.crate.rest.action.SqlHttpHandler.lambda$executeSimpleRequest$1(SqlHttpHandler.java:259)
at io.crate.data.breaker.BlockBasedRamAccounting.addBytes(BlockBasedRamAccounting.java:74)
at io.crate.breaker.TypedRowAccounting.accountForAndMaybeBreak(TypedRowAccounting.java:75)
at io.crate.breaker.TypedRowAccounting.accountForAndMaybeBreak(TypedRowAccounting.java:36)
at io.crate.rest.action.RestResultSetReceiver.setNextRow(RestResultSetReceiver.java:66)
at io.crate.session.RetryOnFailureResultReceiver.setNextRow(RetryOnFailureResultReceiver.java:70)
at io.crate.session.RowConsumerToResultReceiver.consumeIt(RowConsumerToResultReceiver.java:81)
at io.crate.session.RowConsumerToResultReceiver.accept(RowConsumerToResultReceiver.java:61)
at io.crate.execution.engine.InterceptingRowConsumer.lambda$tryForwardResult$1(InterceptingRowConsumer.java:93)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
However, the same query against the FLOAT_VECTOR
table leads to an OutOfMemory
error:
SELECT * FROM "WayveScenesVectors";
java.lang.OutOfMemoryError: Java heap space
-- no stack trace available
Expected Result
CircuitBreakingException
in both cases.