-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Codebase indexing (Clean history) #3137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codebase indexing (Clean history) #3137
Conversation
|
Just saw this and haven't looked at the code yet but first thought:
I have a lot of past experience with both. They're fine but not what I would expect to be integrated in an OSS project such as this. |
Thank you for your feedback @bitnom. Let me respond to your points
It also supports Ollama
You didn't offer any alternatives, at this point changing the vector store shouldn't be too much work, but as you might understand, I would need some good reasons to change the vector store when I'm about finished with this. |
While I'm totally new to this topic and can't offer any personal experiences with it, and thus might be falling for some marketing claims, I recently searched for a graph db for some RAG sytem. There I stumbled over FalkorDB. Extended RedisGraph, based on Redis as far as I understood it. From there: https://www.falkordb.com/blog/vector-database-vs-graph-database/
And from what I read elsewhere, Graph databases hold huge potential as well. So having an integrated solution that can support both might have even more benefits in the long run if we ever decide to put the tasks and their interaction into some graph structure, augmented with code references etc. It's open source and has even a free hosted tier for those who don't care about why it's free as long as it's free :D Edit: And, again without personal experience and totally unrelated, a framework to throw into the ring: https://ts.llamaindex.ai (there's also a python variant, but given the codebase here is Typescript I guess the TS variant to be more relevant) |
What about adding gemini-embedding-exp-03-07? |
@v0idRift |
@daniel-lxs really likes this work, tried giving a hands on using it (even though I know it is still in development phase). Any hint for why this is happening & how can this be resolved? |
@Shanmukh-C you can go ahead and try again, let me know if the issue is fixed |
cca718a
to
8c23329
Compare
@mrubens I have reviewed this and @daniel-lxs has some bugs to squash. Once he is done he is going to add the tool to the workflow and see how well operates automatically. Right now it seemed to work quite well for exploring the codebase and was quite smooth once I was able to get it working. |
I took note of the bugs that @hannesrudolph was getting, but overall I think when he actually got the tool working it was pretty smooth. I'll try to get some time to work on it this week. |
Please take a look at Gemini Embedding Experimental 03-07 if you have some time to play with it, this model looks perspective, i know more models will be added in feature, TY <3 |
446178e
to
9f96628
Compare
Clipboard-20250514-004450-720.1.mp4indexingWOOOOOOO.mp4Here are two examples showing this working. You might need to pause them to see results |
… deleting the cache file
…e settings localization
269f7b2
to
2331cf3
Compare
export const MAX_SEARCH_RESULTS = 50 // Maximum number of search results to return | ||
|
||
/**File Watcher */ | ||
export const QDRANT_CODE_BLOCK_NAMESPACE = "f47ac10b-58cc-4372-a567-0e02b2c3d479" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, where does this come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of this as a salt, it basically let's me generate deterministic uuids from the file paths so I can use them to overwrite changed files using the file path and prevent them from being orphaned.
The reason I need deterministic uuids is that qdrant doesn't allow me to save any string as an ID, only an uuid.
@@ -0,0 +1,4 @@ | |||
import { extensions as allExtensions } from "../../tree-sitter" | |||
|
|||
// Filter out markdown extensions for the scanner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just because markdown results aren’t that useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an issue I had at the beginning, our markdown parser wasn't really parsing markdown in a meaningful way, however I do have a simple parser now that works as a fallback and I should be able to re enable markdown parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💥
Can you fix the test and then we can merge? 🙏 |
…ngToFinish and using direct event accumulation
…g name in ClineProvider tests
we have a winner!! |
i have problems making ollama work on windows 10 OS. |
Context
This PR implements codebase indexing using OpenAI embeddings models and Qdrant as a vector storage.
Also allows a new tool
codebase_search
which allows Roo to use natural language to search the code indexed.The implementation reuses the tree sitter queries already present in the project.
Implementation
The implementation uses the tree sitter to parse code files, the list of compatible files is already defined in the project and is based on the queries already implemented. The code segments have to comply with a certain size to be indexed, if the code segment is too big, the parser will instead process the each individual children instead.
The quality of the code segments depends on the tree sitter queries already defined.
After the initial indexing, the codebase indexing service will start a file watcher to watch for any changes in the files it already indexed, this includes deleted files.
Screenshots
How to Test
codebase_search
tool will be made available to new tasks.Note: The tool is conditionally added to the system prompt, if the service is disabled the tool will be completely missing from the system prompt.
Get in Touch
You know where to find me.
Important
Implements codebase indexing with OpenAI embeddings and Qdrant, adding a
codebase_search
tool for natural language code search, with comprehensive configuration and UI support.codebase_search
tool for natural language code search.CodeIndexConfigManager
for managing configuration inconfig-manager.ts
.CodeIndexSettings
inCodeIndexSettings.tsx
.zod
schemas.CodeIndexManager
orchestrates indexing and searching.QdrantVectorStore
handles vector operations with Qdrant.CodeParser
andDirectoryScanner
parse and scan code files.FileWatcher
monitors file changes.CodebaseSearchResultsDisplay
andCodebaseSearchResult
components display search results.SettingsView
to include codebase indexing settings.ExtensionMessage
andWebviewMessage
to handle new message types.tools.ts
.This description was created by
for 9f96628. You can customize this summary. It will automatically update as commits are pushed.