-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat: Qdrant hybrid search #2787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anush008 <anushshetty90@gmail.com>
Signed-off-by: Anush008 <anushshetty90@gmail.com>
Hey @Anush008, thanks a lot for this contribution. It was recently added in the community wishlist and I was about to get to it 😅. Really appreciate it. I'll test it from my side soon and get back!! |
Signed-off-by: Anush008 <anushshetty90@gmail.com>
a6bbc1d
to
0bf5bb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments @Anush008
For some reference on the design we follow, it'll be better for a quick reference to here- https://github.com/agno-agi/agno/blob/main/libs/agno/agno/vectordb/lancedb/lance_db.py
- We ideally would want to keep
keyword_search
,vector_search
andhybrid_search
as their own functions based on the search type. - It'll be great if you could default to this approach and implement them also? Only if you get the time, else we're fine with
hybrid_search
for now :)
Signed-off-by: Anush008 <anushshetty90@gmail.com>
…_hybrid_search() Signed-off-by: Anush008 <anushshetty90@gmail.com>
Signed-off-by: Anush008 <anushshetty90@gmail.com>
Signed-off-by: Anush008 <anushshetty90@gmail.com>
Hey @kausmeows. I incorporated your suggestions. |
Signed-off-by: Anush008 <anushshetty90@gmail.com>
Hey guys. Just bumping this PR. Please take a look when possible. |
Any updates? |
Hey @Anush008 sorry about the delay, we were working on a major feature here- #3005 on supporting knowledge based filtering (manual + agentic). There have been some major changes in the qdrant db file which i think will conflict here. Next step is hybrid search. If you'd like to resolve the conflicts and take this to the finish line please let me know? Else I can take over this PR and build on top of it!!? Thanks a lot for this amazing contribution |
…earch Signed-off-by: Anush008 <anushshetty90@gmail.com>
Hello @kausmeows. |
Signed-off-by: Anush008 <anushshetty90@gmail.com>
e31a2c4
to
adefab2
Compare
@Anush008 hybrid search is not working?? NOTE: better to add this in the cookbooks in the path- from agno.agent import Agent
from agno.knowledge.pdf_url import PDFUrlKnowledgeBase
from agno.vectordb.qdrant import Qdrant, SearchType
COLLECTION_NAME = "thai-recipes"
vector_db = Qdrant(collection=COLLECTION_NAME, url="http://localhost:6333", search_type=SearchType.hybrid)
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
vector_db=vector_db,
)
knowledge_base.load(recreate=False) # Comment out after first run
# Create and use the agent
agent = Agent(knowledge=knowledge_base, show_tool_calls=True)
agent.print_response(
"List down the ingredients to make Massaman Gai", markdown=True) Also change the from agno.vectordb.qdrant.qdrant import Qdrant
from agno.vectordb.search import SearchType
__all__ = [
"Qdrant", "SearchType"
] |
key = f"meta_data.{key}" | ||
filters = self._format_filters(filters) | ||
if self.search_type == SearchType.vector: | ||
results = self._run_vector_search_sync(query, limit, filters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also lets rename the functions here-
if self.search_type == SearchType.vector:
results = self.vector_search(query, limit, filters)
elif self.search_type == SearchType.keyword:
results = self.keyword_search(query, limit, filters)
elif self.search_type == SearchType.hybrid:
results = self.hybrid_search(query, limit, filters)
if self.search_type == SearchType.vector:
results = await self.async_vector_search(query, limit, filters)
elif self.search_type == SearchType.keyword:
results = await self.async_keyword_search(query, limit, filters)
elif self.search_type == SearchType.hybrid:
results = await self.async_hybrid_search(query, limit, filters)
from agno.agent import Agent | ||
from agno.models.openai import OpenAIChat | ||
from agno.tools.yfinance import YFinanceTools | ||
from langtrace_python_sdk import langtrace # Must precede other imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be before the agno imports, right?
@Anush008 also just noticed something- Technically it is a breaking change because it requires new columns, so existing tables would have to be recreated. because my existing qdrant didn't store embeddings as wdyt, can we do that? |
Yeah. It defaults to an unnamed dense vector. But now, we're using 2 named vectors for dense and sparse.
This can be made to work, but the code becomes riddled with conditionals and will be hard to maintain. |
Hmh i see, but how big a refactor that be, i think a few conditionals won't hurt as long as they dont force users to do something that they surely will be very uncomfortable doing..? @dirkbrnd @manuhortet how do you feel about this? |
Summary
This PR adds support for hybrid searches when using Qdrant as the vector store.
qdrant/fastembed's BM25 is used for sparse embeddings by default(Customizable).
Qdrant's hybrid search reference: https://qdrant.tech/documentation/concepts/hybrid-queries/#hybrid-search
Type of change