8000 simpify nearest_neighbors query when ORDER BY clause matches SELECT alias by moracca · Pull Request #20 · ankane/neighbor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

simpify nearest_neighbors query when ORDER BY clause matches SELECT alias #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

moracca
Copy link
@moracca moracca commented May 28, 2024

when the ORDER BY clause exactly matches the "AS neighbor_distance" select clause, we can simply use the neighbor_distance alias to simplify the query.

Ultimately doesn't change the function of the query, but cuts the length in half which simplifies things when the query is being logged to log files etc. since it removes the need for including all the vectors 2x in the query

e.g. changes a query like this:

SELECT "llm_embeddings"."id", "llm_embeddings"."source_type", "llm_embeddings"."source_id", "llm_embeddings"."created_at", "llm_embeddings"."updated_at", "llm_embeddings"."created_by", "llm_embeddings"."updated_by", "llm_embeddings"."llm_model_id",
  "llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353,<.....>,0.024415132566991064]' AS neighbor_distance
FROM "llm_embeddings"
WHERE "llm_embeddings"."source_type" = 'LlmSource' AND "llm_embeddings"."embedding" IS NOT NULL
ORDER BY "llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353,<.....>,0.024415132566991064]'
LIMIT 5;

into

SELECT "llm_embeddings"."id", "llm_embeddings"."source_type", "llm_embeddings"."source_id", "llm_embeddings"."created_at", "llm_embeddings"."updated_at", "llm_embeddings"."created_by", "llm_embeddings"."updated_by", "llm_embeddings"."llm_model_id",
  "llm_embeddings"."embedding" <-> '[-0.0017242150271110192,-0.029317252896789353<.....>,0.024415132566991064]' AS neighbor_distance
FROM "llm_embeddings"
WHERE "llm_embeddings"."source_type" = 'LlmSource' AND "llm_embeddings"."embedding" IS NOT NULL
ORDER BY neighbor_distance
LIMIT 5;

When the vector list is many hundreds or thousands of vectors long, this can really help clean up log files

…lias

when the ORDER BY clause exactly matches the "AS neighbor_distance" select
clause, simply use the neighbor_distance alias to simplify the query.

Ultimately doesn't change the function of the query, but cuts the length in half
which simplifies things when the query is being logged to log files etc.
@ankane
Copy link
Owner
ankane commented Jun 26, 2024

Hi @moracca, thanks for the PR. However, this will cause issues with methods that change the SELECT clause afterwards, like reselect and pluck (see the failing test case).

@ankane ankane closed this Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0