8000 Feature/document RAG Tool by arunmenon9 · Pull Request #1557 · ComposioHQ/composio · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Feature/document RAG Tool #1557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

arunmenon9
Copy link
@arunmenon9 arunmenon9 commented Apr 20, 2025

Sweep Summary Sweep

Adds a new Document RAG Tool that generates embeddings from documents and enables semantic search and question answering on document content.

  • Implemented DocumentRagTool with two main actions: UploadDocument for processing various document formats (PDF, DOC, TXT, CSV) and QueryDocument for retrieving information using LangChain and ChromaDB.
  • Added document collection management to organize uploaded documents with metadata and enable targeted searches.
  • Created an example agent implementation in document_rag_agent/main.py that demonstrates how to use the tool for document processing and question answering.
  • Included comprehensive documentation and setup 8000 scripts for easy integration and usage.

Ask Sweep AI questions about this PR

Changes made :

  1. Added a new tool that generates embeddings of a PDF/doc/folder given a PDF
  2. Added the agent code to utilise the document RAG tool.

Please note :

The agent side commits made in this repo are not prod ready. I was not able to register the tool with composio tools. This will throw a tool not found error.
The working code is pushed in the repo below. The only difference between this and the working code is, instead of using the composio library I am directly loading the tools in the agent.

github : https://github.com/arunmenon9/document-rag

Copy link
vercel bot commented Apr 20, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
composio ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 20, 2025 4:24pm

Copy link
Contributor

Review Summary

Skipped posting 5 drafted comments based on your review threshold. Feel free to update them here.

Draft Comments
python/composio/tools/local/documentragtool/actions/document_rag_action.py:191-191
`processed_files` is only defined in the directory branch, so referencing it in the file branch for `file_count` will raise a `NameError` if a single file is uploaded.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: The total score does not exceed the required threshold of 14. While the bug could cause a runtime error, it is not guaranteed to crash production in all cases, the fix is clear but not highly urgent, and the overall impact is moderate.

Analysis: The bug could cause a NameError if a single file is uploaded, but this is a moderate production risk rather than a guaranteed crash. The fix is direct and clear, but the urgency is not critical. The total score is below the required threshold, so the comment should be removed.

python/composio/tools/local/documentragtool/actions/document_rag_action.py:333-333
The `ChatOpenAI` model name 'gpt-4.1-nano' may not exist or be available, which can cause runtime errors if the model is not supported.

Scores:

  • Production Impact: 3
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 11

Reason for filtering: The total score does not exceed the required threshold of 14 for inclusion. The issue is unlikely to cause immediate production crashes, the fix is clear, but the urgency is not critical.

Analysis: The bug could cause runtime errors if the model does not exist, but this is not guaranteed to happen in all environments. The fix is direct and specific, but the urgency is moderate since the system will not be immediately impaired. The total score is below the threshold, so the comment should be removed.

python/composio/tools/local/documentragtool/tool.py:11-11
Missing newline at end of file can cause issues with some tools and POSIX compliance; add a newline at the end.

Scores:

  • Production Impact: 1
  • Fix Specificity: 5
  • Urgency Impact: 1
  • Total Score: 7

Reason for filtering: The comment addresses a missing newline at end of file, which has minimal production impact and urgency. The total score does not meet the required threshold for inclusion.

Analysis: Missing newline is a minor style/compatibility issue, not a production risk. The fix is clear, but urgency and impact are very low, so the comment should be removed under aggressive filtering.

python/examples/quickstarters/document_rag_agent/main.py:108-124
`is_complex_query` is defined but never used, so all queries are always processed via the agent, making the "simple queries will be processed directly" message misleading.

Scores:

  • Production Impact: 2
  • Fix Specificity: 3
  • Urgency Impact: 2
  • Total Score: 7

Reason for filtering: The total score does not exceed the required threshold of 14. The bug described does not have a high production impact, the fix is not highly specific, and the urgency is low.

Analysis: The issue is about misleading messaging and an unused function, which does not cause crashes or major failures. The fix is somewhat general, and the urgency is low since it does not affect core functionality.

python/examples/quickstarters/document_rag_agent/setup.sh:21-26
Each `echo` to `.env` overwrites the previous content, so only the last API key remains; this prevents both keys from being present. Use `>>` (append) for all but the first write.

Scores:

  • Production Impact: 2
  • Fix Specificity: 5
  • Urgency Impact: 3
  • Total Score: 10

Reason for filtering: The total score does not exceed the required threshold of 14. While the bug could cause confusion or loss of configuration, it does not have a high production impact, the fix is clear, and the urgency is moderate.

Analysis: Overwriting the .env file could cause missing configuration, but it is unlikely to cause a production crash or critical failure. The fix is very clear and directly applicable. The urgency is moderate since it should be fixed to avoid confusion, but it is not critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0