8000 `OutOfMemoryError` when querying table with large vector field · Issue #17999 · crate/crate · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
OutOfMemoryError when querying table with large vector field #17999
Open
@hammerhead

Description

@hammerhead

CrateDB version

5.10.7

CrateDB setup information

CrateDB Cloud single-node S4

Problem description

When querying a large result set with vectors of length 1152, there is an OutOfMemoryError instead of a CircuitBreakingException.

The issue does not occur when the vector column is defined as a plain ARRAY(DOUBLE).

Steps to Reproduce

CREATE TABLE IF NOT EXISTS "doc"."WayveScenes" (
   "Vector_ID" TEXT,
   "File_Path" TEXT,
   "Coordinate" ARRAY(DOUBLE PRECISION),
   "Vector" ARRAY(DOUBLE PRECISION)
)
CLUSTERED INTO 4 SHARDS;

CREATE TABLE IF NOT EXISTS "doc"."WayveScenesVectors" (
   "Vector_ID" TEXT,
   "File_Path" TEXT,
   "Coordinate" GEO_POINT,
   "Vector" FLOAT_VECTOR(1152)
)
CLUSTERED INTO 4 SHARDS;

I imported the file https://huggingface.co/datasets/quasara-io/WayveScenes/resolve/main/data/Main_1-00000-of-00001.parquet using the Cloud import functionality into both tables.

Actual Result

Querying the table with ARRAY(DOUBLE) fails with a CircuitBreakingException, which is expected given that the node is fairly small and there is no LIMIT clause:

SELECT * FROM "WayveScenes";

org.elasticsearch.common.breaker.CircuitBreakingException: Allocating 2mb for 'parent: http-result' failed, breaker would use 983.9mb in total. Limit is 972.7mb. Either increase memory and limit, change the query or reduce concurrent query load
	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:275)
	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:129)
	at io.crate.rest.action.SqlHttpHandler.lambda$executeSimpleRequest$1(SqlHttpHandler.java:259)
	at io.crate.data.breaker.BlockBasedRamAccounting.addBytes(BlockBasedRamAccounting.java:74)
	at io.crate.breaker.TypedRowAccounting.accountForAndMaybeBreak(TypedRowAccounting.java:75)
	at io.crate.breaker.TypedRowAccounting.accountForAndMaybeBreak(TypedRowAccounting.java:36)
	at io.crate.rest.action.RestResultSetReceiver.setNextRow(RestResultSetReceiver.java:66)
	at io.crate.session.RetryOnFailureResultReceiver.setNextRow(RetryOnFailureResultReceiver.java:70)
	at io.crate.session.RowConsumerToResultReceiver.consumeIt(RowConsumerToResultReceiver.java:81)
	at io.crate.session.RowConsumerToResultReceiver.accept(RowConsumerToResultReceiver.java:61)
	at io.crate.execution.engine.InterceptingRowConsumer.lambda$tryForwardResult$1(InterceptingRowConsumer.java:93)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1575)

However, the same query against the FLOAT_VECTOR table leads to an OutOfMemory error:

SELECT * FROM "WayveScenesVectors";

java.lang.OutOfMemoryError: Java heap space
-- no stack trace available

Expected Result

CircuitBreakingException in both cases.

Metadata

Metadata

Assignees

Labels

bugClear identification of incorrect behaviour

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0