8000 Adding optional structured generation to CodeAgent by akseljoonas · Pull Request #1346 · huggingface/smolagents · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adding optional structured generation to CodeAgent #1346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 22, 2025

Conversation

akseljoonas
Copy link
Collaborator

Adds a boolean use_structured_generation to CodeAgent, which turns the "Thought:.. Code:..." response pattern into JSON-structured generation. It has been shown to increase model performance on benchmarks in internal tests.

If use_structured_generation is set to True, the LLM output will look like this:

{
  "thought": " ... ", 
  "code": " ... "
}

To make this work, I added a modified prompt that implements the few-shot examples in the new JSON format. Note: The LLM is prompted not to generate strictly valid JSON, since this helps some models' performance by removing formatting overhead.

Copy link
Member
@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I have a few general comments and questions that could help clarify and improve the PR.

First, it would be really helpful to outline which models currently support structured generation and which don't, as well as the exact parameter names they use for this feature. This context is important to determine the appropriate level in the codebase where the new parameter should be implemented, especially if the enhancement aims to be general across models. Additionally, rather than introducing a custom model parameter name like response_format, it might be better to align with existing conventions to maintain consistency and reduce confusion.

And finally, it would be great to see specific tests covering the new structured generation feature. Tests not only verify correctness but also serve as executable documentation, which benefits future maintainers and users.

Comment on lines 181 to 210
grammar (`dict[str, str]`, *optional*): Grammar used to parse the LLM output.
response_format (`dict[str, str]`, *optional*): Response format used to parse the LLM output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you replace grammar? I would say we want to support both:

  • response_format: for structured data
  • grammar: for more flexible output format that JSON Schema can't describe (like programming languages, structured natural language, etc.)

And, even if we really wanted to replace grammar, I would strongly recommend to start a proper deprecation cycle to avoid breaking existing users' code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • response_format seems more aligned with the argument used by most, but we'd need to document it to justify the change: so we need a doc showing the parameters used by most API providers, as we discussed @akseljoonas and suggested again by @albertvillanova above. This will also be necessary to track support for JSON/regex schemas across providers
  • Agree as well with the proper deprecation cycle! For this you can use a warning as we do here:
    def logs(self):

@aymeric-roucher
Copy link
Collaborator

Nice PR, thanks a lot!
Agree with @albertvillanova's comment above, in particular:

  • Why change from grammar to response_format? This can be warranted, we discussed in person response_format could be more frequently used. But then we need to document this with a list across API providers.
  • Rather than adding this new _json_step method, re-use previous methods, as suggested above by albert.

Finally, about separating grammar from response_format: for me since these parameters would do the same thing, it's probably better to rename/not rename but keep them one parameter.

@akseljoonas
Copy link
Collaborator Author
akseljoonas commented May 19, 2025

Sorry, I forgot to add the document showing the Models that support structured generation. You can find it here. The Main reason for renaming grammar was that passing it to any of the providers didn't work. In the API calls it had to be response_format anyway.

  • In general, it seems like inference/model providers are taking OpenAI's lead and are moving away from supporting structured regex and grammar generation in favour of JSON only. Right now, VLLM and HF are the only ones supporting grammar+regex afaik.
  • For MultiStepAgent, maybe it makes sense to be able to turn on the use_structured_output and use the default JSON schema for the added benefit, but still be able to provide your own custom schema if more granular control is needed.
    • In that case, we could still support regex and grammars by providing response_format = { "type" : "regex", ...} instead of { "type": "json_schema", ...}. However, more mappings are needed for VLLM and HFInference then.
    • Maybe rename use_structured_output to use_default_structured_output in that case.

What do you think? @albertvillanova @aymeric-roucher

@aymeric-roucher
Copy link
Collaborator

So, let's try to make this API as simple as possiblle:

  • IMO, if we have an argument grammar or response_format, it should be a argument made to let people specify their grammar/response format.
  • Here since the vast majority of providers (cf Aksel's doc in screenshot below) support, in terms of grammar/response_format/structured generation (for me these terms are synonymous), only JSON (as opposed to regex or other types), we might as well only support this.
image

In our specific case, the usage of grammar is simplified by a few things:

  • Only used in CodeAgent, since ToolCallingAgent already uses tool-calling APIs from models, which already have structured generation.
  • it's will always be the same: one think block and one Action block (code block in the case of CodeAgent). So no real need for customization : thus no real need IMO to accept a complex object like response_format, this can be handled under the hood.

As a result, I think it's simpler to remove the grammar argument from all agents (with a deprecation warning, anyway this arg isn't doing anything in most models), and replace it only for CodeAgent by a simple flag like use_grammar or use_structured_outputs, which when passed simply enforces the the models follows our default response_format.

tools_to_call_from: list[Tool] | None = None,
**kwargs,
) -> Generator[ChatMessageStreamDelta]:
generation_kwargs = self._prepare_completion_args(
messages=messages,
stop_sequences=stop_sequences,
grammar=grammar,
response_format=None, # Transformers doesn't support structured generation, use VLLMModel for that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid this behaviour, since this create a silent deviation from user expectations, leading to hard-to-debug issues : instead, it's not a problem to just raise an error inside the model if the user tried to enforce structured generation with a TransformersModel

@@ -172,6 +172,15 @@ def parse_json_blob(json_blob: str) -> tuple[dict[str, str], str]:
)


def extract_code_from_text(text: str) -> str | None:
Copy link
Collaborator
@aymeric-roucher aymeric-roucher May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akseljoonas I unified code blob parsing functions in one!

@aymeric-roucher aymeric-roucher force-pushed the structured-generation branch from 9d59f6b to 446c138 Compare May 21, 2025 09:49
Copy link
Member
@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment/question before diving into the PR details.

From what I see, this PR introduces an internal setting that primarily affects the agent's handling of "code" and "thought" blocks, rather than enabling broader structured output capabilities.

For instance, it is not intended to support a use case where a user requests something like

“Give me the 10 most cited references about AI agents”

and requests the final answer to conform to a user-defined JSON schema by passing use_structured_output.

Therefore, if this is indeed more of an internal formatting control, I wonder if the parameter name use_structured_output might be a bit misleading: users might reasonably expect it to influence the structure of the actual task output, not just internal representation.

Copy link
Member
@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I can fix the remaining tests and propose another way for the deprecation if you want.

self.grammar = grammar
if grammar is not None:
warnings.warn(
"The `grammar` argument is deprecated and will have no effect on agent behavior.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better specifying the removal version.

Suggested change
"The `grammar` argument is deprecated and will have no effect on agent behavior.",
"Argument 'grammar' is deprecated and will be removed in version 1.20. ",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if grammar is not None:
warnings.warn(
"The `grammar` argument is deprecated and will have no effect on agent behavior.",
DeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should propose using use_structured_output instead, but not sure in this case because use_structured_output is a CodeAgent-only param and is not exactly equivalent.

if grammar is not None:
warnings.warn(
"The `grammar` argument is deprecated and will have no effect on agent behavior.",
DeprecationWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use a FutureWarning:

Suggested change
DeprecationWarning,
FutureWarning,

F438 Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -922,7 +935,6 @@ def to_dict(self) -> dict[str, Any]:
"prompt_templates": self.prompt_templates,
"max_steps": self.max_steps,
"verbosity_level": int(self.logger.level),
"grammar": self.grammar,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick note regarding the deprecation because I think it is not well implemented. During the deprecation cycle, the grammar parameter should maintain its previous behavior to ensure backward compatibility, just with the addition of a deprecation warning to inform users of the upcoming change.

This approach helps avoid unexpected breakages for users relying on the current behavior while giving them time to adapt.

I can take care of this if you want.

@@ -172,6 +172,15 @@ def parse_json_blob(json_blob: str) -> tuple[dict[str, str], str]:
)


def extract_code_from_text(text: str) -> str | None:
"""Extract code from the LLM's output."""
pattern = r"```(?:py|python)?\s*\n(.*?)\n```"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked this in detail, but according to the system prompt, there should not be triple bacticks, should there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The code extraction from JSON is broken after the last commits. Communication issue, I'll fix it tomorrow morning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @aymeric-roucher @albertvillanova! Can you check the code extraction from JSON for any potential security issues and logic? :)

@albertvillanova albertvillanova force-pushed the structured-generation branch from 3f54ff8 to 98edc0a Compare May 21, 2025 13:41
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

code_action = json.loads(output_text)["code"]
code_action = extract_code_from_text(code_action) or code_action
try:
code_action, _ = parse_json_blob(output_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akseljoonas I don't like this if/else : the two functions seem to be doing the same thing.
Moreover this try/catch logic does not need to be in agents.py: so let's just integrate all the logic in parse_json_blob

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more generally since the issue you're trying to solve is specifically due to the InferenceClient provider together not really respecting structured outputs for one specific model, I'm not convinced it needs handling in smolagents anyway. Let's just not enable it for now and open a further PR to handle it, where you can document the specific issue, which providers/modls it affects, and why solve it with complex regex.

`str`: The parsed code.
"""

code_match = re.search(r'"code"\s*:\s*"(.*?)"(?=\s*}\s*$|\s*,\s*")', text, re.DOTALL)
Copy link
Collaborator
@aymeric-roucher aymeric-roucher May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookahead (.*?) could lead to backtracking, so this is a security risk that should be handled with test cases. Let's put this in the next PR, cf other comment.

@aymeric-roucher aymeric-roucher force-pushed the structured-generation branch from bedeff2 to ef94545 Compare May 22, 2025 13:00
@aymeric-roucher
Copy link
Collaborator
aymeric-roucher commented May 22, 2025

Thanks @akseljoonas ! Merging this, and let's handle the edge cases for some providers in a subsequent PR (I'll open it)

@albertvillanova i change the argument to use_structured_outputs_internally to disambiguate from returning structured outputs.

@aymeric-roucher aymeric-roucher merged commit d60c7be into main May 22, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0