-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Adding optional structured generation to CodeAgent #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s and removed pydantic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I have a few general comments and questions that could help clarify and improve the PR.
First, it would be really helpful to outline which models currently support structured generation and which don't, as well as the exact parameter names they use for this feature. This context is important to determine the appropriate level in the codebase where the new parameter should be implemented, especially if the enhancement aims to be general across models. Additionally, rather than introducing a custom model parameter name like response_format
, it might be better to align with existing conventions to maintain consistency and reduce confusion.
And finally, it would be great to see specific tests covering the new structured generation feature. Tests not only verify correctness but also serve as executable documentation, which benefits future maintainers and users.
src/smolagents/agents.py
Outdated
grammar (`dict[str, str]`, *optional*): Grammar used to parse the LLM output. | ||
response_format (`dict[str, str]`, *optional*): Response format used to parse the LLM output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you replace grammar? I would say we want to support both:
response_format
: for structured datagrammar
: for more flexible output format that JSON Schema can't describe (like programming languages, structured natural language, etc.)
And, even if we really wanted to replace grammar
, I would strongly recommend to start a proper deprecation cycle to avoid breaking existing users' code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_format
seems more aligned with the argument used by most, but we'd need to document it to justify the change: so we need a doc showing the parameters used by most API providers, as we discussed @akseljoonas and suggested again by @albertvillanova above. This will also be necessary to track support for JSON/regex schemas across providers- Agree as well with the proper deprecation cycle! For this you can use a warning as we do here:
smolagents/src/smolagents/agents.py
Line 515 in bf44ee1
def logs(self):
Nice PR, thanks a lot!
Finally, about separating grammar from response_format: for me since these parameters would do the same thing, it's probably better to rename/not rename but keep them one parameter. |
Sorry, I forgot to add the document showing the Models that support structured generation. You can find it here. The Main reason for renaming
What do you think? @albertvillanova @aymeric-roucher |
src/smolagents/models.py
Outdated
tools_to_call_from: list[Tool] | None = None, | ||
**kwargs, | ||
) -> Generator[ChatMessageStreamDelta]: | ||
generation_kwargs = self._prepare_completion_args( | ||
messages=messages, | ||
stop_sequences=stop_sequences, | ||
grammar=grammar, | ||
response_format=None, # Transformers doesn't support structured generation, use VLLMModel for that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this behaviour, since this create a silent deviation from user expectations, leading to hard-to-debug issues : instead, it's not a problem to just raise an error inside the model if the user tried to enforce structured generation with a TransformersModel
@@ -172,6 +172,15 @@ def parse_json_blob(json_blob: str) -> tuple[dict[str, str], str]: | |||
) | |||
|
|||
|
|||
def extract_code_from_text(text: str) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akseljoonas I unified code blob parsing functions in one!
f3d9b4c
to
7655f48
Compare
9d59f6b
to
446c138
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general comment/question before diving into the PR details.
From what I see, this PR introduces an internal setting that primarily affects the agent's handling of "code" and "thought" blocks, rather than enabling broader structured output capabilities.
For instance, it is not intended to support a use case where a user requests something like
“Give me the 10 most cited references about AI agents”
and requests the final answer to conform to a user-defined JSON schema by passing use_structured_output
.
Therefore, if this is indeed more of an internal formatting control, I wonder if the parameter name use_structured_output
might be a bit misleading: users might reasonably expect it to influence the structure of the actual task output, not just internal representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I can fix the remaining tests and propose another way for the deprecation if you want.
src/smolagents/agents.py
Outdated
self.grammar = grammar | ||
if grammar is not None: | ||
warnings.warn( | ||
"The `grammar` argument is deprecated and will have no effect on agent behavior.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better specifying the removal version.
"The `grammar` argument is deprecated and will have no effect on agent behavior.", | |
"Argument 'grammar' is deprecated and will be removed in version 1.20. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/smolagents/agents.py
Outdated
if grammar is not None: | ||
warnings.warn( | ||
"The `grammar` argument is deprecated and will have no effect on agent behavior.", | ||
DeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should propose using use_structured_output
instead, but not sure in this case because use_structured_output
is a CodeAgent-only param and is not exactly equivalent.
src/smolagents/agents.py
Outdated
if grammar is not None: | ||
warnings.warn( | ||
"The `grammar` argument is deprecated and will have no effect on agent behavior.", | ||
DeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use a FutureWarning
:
DeprecationWarning, | |
FutureWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/smolagents/agents.py
Outdated
@@ -922,7 +935,6 @@ def to_dict(self) -> dict[str, Any]: | |||
"prompt_templates": self.prompt_templates, | |||
"max_steps": self.max_steps, | |||
"verbosity_level": int(self.logger.level), | |||
"grammar": self.grammar, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick note regarding the deprecation because I think it is not well implemented. During the deprecation cycle, the grammar parameter should maintain its previous behavior to ensure backward compatibility, just with the addition of a deprecation warning to inform users of the upcoming change.
This approach helps avoid unexpected breakages for users relying on the current behavior while giving them time to adapt.
I can take care of this if you want.
@@ -172,6 +172,15 @@ def parse_json_blob(json_blob: str) -> tuple[dict[str, str], str]: | |||
) | |||
|
|||
|
|||
def extract_code_from_text(text: str) -> str | None: | |||
"""Extract code from the LLM's output.""" | |||
pattern = r"```(?:py|python)?\s*\n(.*?)\n```" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't checked this in detail, but according to the system prompt, there should not be triple bacticks, should there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The code extraction from JSON is broken after the last commits. Communication issue, I'll fix it tomorrow morning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @aymeric-roucher @albertvillanova! Can you check the code extraction from JSON for any potential security issues and logic? :)
3f54ff8
to
98edc0a
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
src/smolagents/agents.py
Outdated
code_action = json.loads(output_text)["code"] | ||
code_action = extract_code_from_text(code_action) or code_action | ||
try: | ||
code_action, _ = parse_json_blob(output_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akseljoonas I don't like this if/else : the two functions seem to be doing the same thing.
Moreover this try/catch logic does not need to be in agents.py: so let's just integrate all the logic in parse_json_blob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more generally since the issue you're trying to solve is specifically due to the InferenceClient
provider together
not really respecting structured outputs for one specific model, I'm not convinced it needs handling in smolagents anyway. Let's just not enable it for now and open a further PR to handle it, where you can document the specific issue, which providers/modls it affects, and why solve it with complex regex.
src/smolagents/utils.py
Outdated
`str`: The parsed code. | ||
""" | ||
|
||
code_match = re.search(r'"code"\s*:\s*"(.*?)"(?=\s*}\s*$|\s*,\s*")', text, re.DOTALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookahead (.*?) could lead to backtracking, so this is a security risk that should be handled with test cases. Let's put this in the next PR, cf other comment.
bedeff2
to
ef94545
Compare
Thanks @akseljoonas ! Merging this, and let's handle the edge cases for some providers in a subsequent PR (I'll open it) @albertvillanova i change the argument to |
Adds a boolean
use_structured_generation
toCodeAgent
, which turns the "Thought:.. Code:..." response pattern into JSON-structured generation. It has been shown to increase model performance on benchmarks in internal tests.If
use_structured_generation
is set to True, the LLM output will look like this:To make this work, I added a modified prompt that implements the few-shot examples in the new JSON format. Note: The LLM is prompted not to generate strictly valid JSON, since this helps some models' performance by removing formatting overhead.