create `explain_info` context variable in `stream_async` to keep trac… #1193

andompesta · 2025-05-13T17:02:00Z

Description

@Pouyanpi this PR is meant to solve a minor bug on tracking the llm call done by nemo guardrails when used in streaming async mode

Related Issue(s)

issue-1192

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

NVIDIA#1181) Fixes NVIDIA#1124 Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>

Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>

Pouyanpi

Thank you Sandro, looks good 👍🏻

Two suggestions:

it'd be nice to make it a method and use it everywhere that same logic is used.

    def _ensure_explain_info(self) -> ExplainInfo:
        explain_info = explain_info_var.get()
        if explain_info is None:
            explain_info = ExplainInfo()
            explain_info_var.set(explain_info)
        return explain_info

and

self.explain_info = self._ensure_expain_info()

so we should remove _update_explain_info in _run_output_rails_in_streaming and use the ensure_explain_info.

If you can make this change and add one test we are good to merge.

Also, don't forget to rebase to develop branch before making the changes.

* feat: add RailException support and improve error handling - Add TypedDict for structured return values - implement RailException for injection detection (a must have for checks) - improve error handling for malformed YARA rules * improve test coverage

Pouyanpi · 2025-05-13T20:14:24Z

@mikemckiernan this should resolve the issue you were facing prior opening #1146.

andompesta · 2025-05-13T21:29:15Z

@Pouyanpi changes made:

_ensure_explain_info is a @staticfunction so has no dependencies on a specific instantiation of the class
tests added on the count of llm calls for async and stream async execution

Pouyanpi · 2025-05-14T05:16:53Z

Thank you @andompesta , this is GREAT 🥇

…ler.py - modified the `StreamingConsumer` class in to store its asyncio task - added a `cancel()` method, and handle `asyncio.CancelledError` - Updating test functions (`test_single_chunk`, `test_sequence_of_chunks`) and the helper function (`_test_pattern_case`) to call `consumer.cancel()` in a `finally` block. These changes prevent `RuntimeError: Event loop is closed` and `Task was destroyed but it is pending!` warnings by ensuring background tasks are correctly cancelled and awaited upon test completion.

coverage - Added tests for various functionalities of StreamingHandler, including: - Piping streams between handlers - Buffering enable/disable behavior - Handling multiple stop tokens - Metadata inclusion and processing - Suffix and prefix pattern handling - Edge cases for __anext__ method - First token handling and generation info - Improved test coverage for async methods and error scenarios. - Addressed potential issues with streaming termination signals.

…ling - Replaced inconsistent use of `None` and `""` for stream termination in `StreamingHandler` with a dedicated `END_OF_STREAM` sentinel object. - Modified `push_chunk` to convert `None` to `END_OF_STREAM`. - Updated `__anext__` to raise `StopAsyncIteration` only for `END_OF_STREAM` and to return empty strings or dicts with empty/None text as data. - Adjusted `_process` to correctly handle `END_OF_STREAM` for buffering and queueing logic. - Updated `on_llm_end` to use `END_OF_STREAM`. - Revised tests in `test_streaming_handler.py` to reflect these changes, including how empty first tokens are handled and how `__anext__` behaves with various inputs.

…k of all calls

…nd llm calls are properly tracked

Copilot

Pull Request Overview

This PR addresses two areas: fixing a minor tracking bug for LLM call explanations via the new explain_info context variable, and standardizing streaming end‐of‐stream signaling using a sentinel object. In addition, it updates injection detection functions to return structured results and improves documentation on configuration and streaming.

Reviewed Changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

8000

File	Description
tests/test_llm_rails_context_variables.py	Added new tests to validate streaming async mode and explain_info tracking.
tests/test_injection_detection.py	Added and updated tests for injection detection guardrails and error handling.
nemoguardrails/streaming.py	Updated streaming logic to use the END_OF_STREAM sentinel consistently instead of None/empty strings.
nemoguardrails/rails/llm/llmrails.py	Refactored explain_info handling by introducing _ensure_explain_info for consistency.
nemoguardrails/library/injection_detection/actions.py	Updated injection detection functions to return structured tuples and adjusted their docstrings.
docs/user-guides/configuration-guide.md & docs/user-guides/advanced/streaming.md	Revised documentation to reflect updated configuration and streaming usage details.

Files not reviewed (2)

nemoguardrails/library/injection_detection/flows.co: Language not supported
nemoguardrails/library/injection_detection/flows.v1.co: Language not supported

Comments suppressed due to low confidence (2)

nemoguardrails/library/injection_detection/actions.py:212

The _omit_injection function now returns a tuple (bool, str) instead of just a string. Ensure that all downstream consumers are updated to handle this new tuple structure.

def _omit_injection(text: str, matches: list["yara.Match"]) -> Tuple[bool, str]:

nemoguardrails/library/injection_detection/actions.py:292

The _reject_injection function now returns a tuple (bool, List[str]) for detected rules instead of a string. Verify that all call sites correctly handle the updated return type.

def _reject_injection(text: str, rules: "yara.Rules") -> Tuple[bool, List[str]]:

Copilot · 2025-05-14T07:55:22Z

nemoguardrails/streaming.py

+                if element is END_OF_STREAM:
                    break

                if isinstance(element, dict):
-                    if element is not None and (
-                        element.get("text") is None or element.get("text") == ""
-                    ):
+                    if element is not None and (element.get("text") is END_OF_STREAM):


[nitpick] Repeated checks against the END_OF_STREAM sentinel appear throughout the streaming logic. Consider extracting this check into a helper function to improve clarity and reduce duplication.

Copilot · 2025-05-14T07:55:23Z

nemoguardrails/rails/llm/llmrails.py

-            # We also keep a general reference to this object
-            self.explain_info = explain_info
-        self.explain_info = explain_info
+        self.explain_info = self._ensure_explain_info()


[nitpick] The explain_info context handling has been refactored to use _ensure_explain_info. Consider updating the corresponding docstring to clarify its purpose and behavior.

mikemckiernan added 2 commits May 9, 2025 13:01

docs: clarify purpose of model parameters field in configuration guide (

d1e5558

NVIDIA#1181) Fixes NVIDIA#1124 Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>

docs: output rails are supported with streaming (NVIDIA#1007)

002a520

Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>

Pouyanpi self-requested a review May 13, 2025 17:45

Pouyanpi requested changes May 13, 2025

View reviewed changes

Pouyanpi added this to the v0.14.0 milestone May 13, 2025

Pouyanpi added the bug Something isn't working label May 13, 2025

Pouyanpi assigned andompesta May 13, 2025

Pouyanpi self-requested a review May 14, 2025 05:17

Pouyanpi and others added 10 commits May 14, 2025 07:50

coverage to the moon: fix missing generation_info and add more tests

9919d3e

create explain_info context variable in stream_async to keep trac…

c03618b

…k of all calls

create _ensure_explain_info method

0ffd837

use _ensure_explain_info in generate_async

073a4ed

use self._ensure_explain_info in stream_async

b5ec02e

remove unused imports

eedde2d

implement test to checks that _ensure_explain_info works properly a…

fc1b922

…nd llm calls are properly tracked

andompesta force-pushed the fix/missing-llm-calls-in-explain_info branch from d7f8a4d to fc1b922 Compare May 14, 2025 07:51

Pouyanpi requested a review from Copilot May 14, 2025 07:53

Copilot AI reviewed May 14, 2025

View reviewed changes

andompesta mentioned this pull request May 14, 2025

fix: Prevent explain_info overwrite during stream_async #1194

Merged

1 task

Pouyanpi closed this May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

create `explain_info` context variable in `stream_async` to keep trac… #1193

create `explain_info` context variable in `stream_async` to keep trac… #1193

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

create explain_info context variable in stream_async to keep trac… #1193

create explain_info context variable in stream_async to keep trac… #1193

Uh oh!

Conversation

Description

Related Issue(s)

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

create `explain_info` context variable in `stream_async` to keep trac… #1193

create `explain_info` context variable in `stream_async` to keep trac… #1193