-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: add support for complex types in component processing #9305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e510b69
d5331c1
3e72fcf
c41aae9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ | |
import os | ||
from copy import deepcopy | ||
from dataclasses import dataclass | ||
from typing import Dict, List | ||
from typing import Dict, List, Union, Optional, Any | ||
|
||
import pytest | ||
|
||
|
@@ -122,6 +122,26 @@ def run(self, documents: List[Document], top_k: int = 5) -> Dict[str, str]: | |
return {"concatenated": "\n".join(doc.content for doc in documents[:top_k])} | ||
|
||
|
||
@component | ||
class ComplexTypeProcessor: | ||
"""A component that processes complex types.""" | ||
|
||
@component.output_types(result=str) | ||
def run( | ||
self, meta: Union[Dict[str, Any], List[Dict[str, Any]]] = None, extraction_kwargs: Optional[Dict[str, Any]] = None | ||
) -> Dict[str, str]: | ||
""" | ||
Processes complex types like dictionaries and unions. | ||
|
||
:param meta: Optional metadata to attach, can be a dictionary or list of dictionaries | ||
:param extraction_kwargs: Optional dictionary containing keyword arguments to customize the extraction process | ||
:return: A dictionary with the result | ||
""" | ||
meta_str = str(meta) if meta else "No metadata" | ||
kwargs_str = str(extraction_kwargs) if extraction_kwargs else "No kwargs" | ||
return {"result": f"Meta: {meta_str}, Kwargs: {kwargs_str}"} | ||
|
||
|
||
def output_handler(old, new): | ||
""" | ||
Output handler to test serialization. | ||
|
@@ -335,6 +355,39 @@ def foo(self, text: str): | |
with pytest.raises(ValueError): | ||
ComponentTool(component=not_a_component, name="invalid_tool", description="This should fail") | ||
|
||
def test_from_component_with_complex_types(self): | ||
component = ComplexTypeProcessor() | ||
|
||
tool = ComponentTool(component=component) | ||
|
||
# Check the parameter schema | ||
assert "meta" in tool.parameters["properties"] | ||
assert "extraction_kwargs" in tool.parameters["properties"] | ||
|
||
# Meta should be oneOf with both object and array options | ||
meta_schema = tool.parameters["properties"]["meta"] | ||
assert meta_schema["description"].startswith("Optional metadata") | ||
assert "oneOf" in meta_schema | ||
|
||
# extraction_kwargs should be an object | ||
kwargs_schema = tool.parameters["properties"]["extraction_kwargs"] | ||
assert kwargs_schema["type"] == "object" | ||
assert "additionalProperties" in kwargs_schema | ||
assert kwargs_schema["additionalProperties"] is True | ||
Comment on lines
+363
to
+376
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I appreciate the additional comments and individual asserts, but could we move update this to a full dict comparison. So assert tool.parameters["properties"] == {...} I think that would make it easier to judge at a glance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is better for readability. But no problem, I will do it |
||
|
||
# Test tool invocation with dict | ||
result = tool.invoke(meta={"source": "web"}, extraction_kwargs={"timeframe": "last month"}) | ||
assert isinstance(result, dict) | ||
assert "result" in result | ||
assert "Meta: {'source': 'web'}" in result["result"] | ||
assert "Kwargs: {'timeframe': 'last month'}" in result["result"] | ||
|
||
# Test tool invocation with list of dicts | ||
result = tool.invoke(meta=[{"id": 1}, {"id": 2}]) | ||
assert isinstance(result, dict) | ||
assert "result" in result | ||
assert "Meta: [{'id': 1}, {'id': 2}]" in result["result"] | ||
|
||
|
||
## Integration tests | ||
class TestToolComponentInPipelineWithOpenAI: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why do we need this special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The reason we single out pure Dict[str, Any] parameters whose description mentions “meta” is that metadata blobs by definition have an arbitrary, unknown shape, and we don’t want to bake an open-ended object schema (with unpredictable keys) into our function spec.
additionalProperties: true
, undertools_strict
I would have to enumerate every possible metadata key (which we can’t know ahead of time), and LLMs often struggle to fill such free-form nested schemas.additionalProperties: true
.