8000 Feature/clavata integration #1025 by ilias-t · Pull Request #1027 · NVIDIA/NeMo-Guardrails · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Feature/clavata integration #1025 #1027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 24, 2025

Conversation

ilias-t
Copy link
Contributor
@ilias-t ilias-t commented Mar 5, 2025

Description

This PR introduces an integration for Clavata for customized LLM content moderation. It adds a detect_policy_match action which can be used in input and output flows.

Co-authored-by: @boichee

Related Issue(s)

#1025

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes. @Pouyanpi

Outstanding items

  • Sign commits
  • Update commits to adhere to DCO
  • Add Jupyter Notebook example
  • Address PR feedback

@ilias-t ilias-t marked this pull request as draft March 5, 2025 05:17
@boichee
Copy link
Contributor
boichee commented Mar 5, 2025

@Pouyanpi here are some of the questions we had for our discussion tomorrow.

Questions for Nvidia Team

Interface questions

  • What does is_system_action do and should we be using it?
  • Is there a reason why we'd pass the $bot_message or $user_message to the action rather than just extracting them from the context?
  • It seems like we should be using execute_async based on the docs, but none of the other plugins seem to do so, what do you recommend here?
  • What are the @active decorators for in the colang flows? Based on the docs this means the author of the bot (user of NeMO) does not have to explicitly activate the flow in their flow. To me, this comes back to whether users want to use our plugin as a "rail" (see below) or use it more explicitly where and when they choose.

NeMo user questions

You provided the PrivateAI plugin as an example but we're not sure it actually matches our technology and how it might be used. Here's why:

  • PrivateAI is used to strip PII from user and bot messages, so it makes sense that a bot builder would want to have this plugin run on EVERY user and bot message. And this is how we interpret the case where a builder specifies an input and output flow in their config.
  • In contrast, with Clavata.ai Policies, there are likely use cases where a builder would not want a policy to run on EVERY user and bot message. For example:
    • A bot that is capable of issuing refunds to customers: For this use case, our customers would likely want to write a policy that specifically defines the rules for refunds. And this policy wouldn't be relevant to every interaction between the user and the bot. It would likely only be relevant when the user asks about a refund.

With the above in mind, we wanted to understand whether builders using NeMo are able to, and in turn, tend to make use of the plugins within their own flows. If so, this affects how we design the plugin interface. We've already included some of these ideas in our design. For example: You'll notice that in our config we've included the ability to assign human-readable aliases to the policy IDs. We did this with the idea that a NeMo builder might want to call our actions directly from their flows and provide a specific policy to use. The idea here is that its easier to supply the alias then to constantly have policy IDs floating around in their flows as the UUIDs are effectively magic numbers.

Basically we're looking to understand how builders are making use of the guardrails plugins. Are they always activated and used as "rails" that run on every input and output? Or are they activated in specific cases? Our suspicion is that both are true and thus we should be supporting both use cases.

relevant_chunks usage

Our system would definitely benefit from additional context when evaluating user/bot messages, but based on the docs we weren't completely clear what the type of relevant_chunks will be and what information will be included in it.

Ideally, relevant_chunks always includes a set of prior user/bot messages that are relevant to the current message. If this is the case, it would make sense for us to provide these chunks to our API as additional context to help it evaluate the current message in the context of the larger interaction thus far.

This, again, goes back to whether we should be using context to extract user and bot messages or if we should simply be passing the $user_message and $bot_message to the action directly as PrivateAI seems to do.

@cparisien cparisien requested a review from Pouyanpi March 5, 2025 14:08
@ilias-t ilias-t force-pushed the feature/clavata-integration branch 2 times, most recently from ae18325 to 465be03 Compare March 5, 2025 21:55
ilias-t and others added 2 commits March 6, 2025 13:05
- Users of the Clavata integration can now specify the exact labels that must match for the input/output to cause the rail to trigger and abort the flow.
- Fixing some aspects of how the configuration is put together
- Policy ID aliases make it easier to specify a policy by name instead of ID.
- The new action `EvaluateUserInputWithClavataPolicy` allows you to evaluate the user input against a Clavata policy part of a flow that a user has written.
- Added the ability for a user to specify ANY/ALL logic for label matches.

Co-authored-by: Brett Levenson <brett@clavata.ai>
Signed-off-by: Ilias Tsangaris <iliastsangaris@gmail.com>
@ilias-t ilias-t force-pushed the feature/clavata-integration branch from 602660a to 8eec8a6 Compare March 6, 2025 21:05
@ilias-t ilias-t marked this pull request as ready for review March 7, 2025 23:49
@boichee boichee force-pushed the feature/clavata-integration branch 2 times, most recently from b2b35a6 to 7c30e9d Compare March 9, 2025 01:17
- v1 colang requires that the policies for input and output rails be specified in the config because parameters cannot be passed to flows
- v2 colang allows parameters in flow definitions, so it should be possible to simply pass the policy and label arguments at the time the flow is defined.

Also improved the Clavata Client to handle rate limit and exponential backoff situations.
@boichee boichee force-pushed the feature/clavata-integration branch from 7c30e9d to 36c697a Compare March 9, 2025 01:20
Copy link
Contributor
@boichee boichee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few comments to explain our reasoning and ask some questions on things I wasn't sure about (mostly related to how rails are meant to work in 2.x and how variables are meant to be passed.)

- Fixed a small bug with UUID format when sent to server
@boichee boichee force-pushed the feature/clavata-integration branch from 868a24d to 4aba727 Compare March 9, 2025 02:01
Copy link

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1027

@Pouyanpi Pouyanpi removed their request for review March 17, 2025 11:29
@Pouyanpi Pouyanpi self-assigned this Mar 21, 2025
@Pouyanpi Pouyanpi added this to the v0.13.0 milestone Mar 21, 2025
Copy link
Collaborator
@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great, we just need to make some adjustments.

This round of review is mostly Colang 2.0 related.

using the migration tool (see its usage):

Here is how your Colang 2.0 file should look like (flows.co)

#### POLICY DETECTION RAILS ####

# INPUT RAIL
@active
flow clavata check input
  """Check if the user input is benign."""
  $is_match = await ClavataCheckV1Action(rail="input", text=$user_message)

  if $is_match
    if $system.config.enable_rails_exceptions
      global $msg
      $msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
      send ClavataPolicyMatchException(message=$msg)
    else
      bot refuse to respond
    abort


# OUTPUT RAIL
@active
flow clavata check output
  """Check if the bot output is benign."""
  $is_match = await ClavataCheckV1Action(rail="output", text=$bot_message)

  if $is_match
    if $system.config.enable_rails_exceptions
      global $msg
      $msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
      send ClavataPolicyMatchException(message=$msg)
    else
      bot refuse to respond
    abort
    

For your example config in examples/configs/clavata you need to have following config for Colang 2.0 (as mentioned in the comments you need to have two separate configurations thus directories for Colang 1 and 2)

colang_version: 2.x
# Example for colang 1.0
models:
 - type: main
   engine: openai
   model: gpt-3.5-turbo-instruct

rails:
 config:
   clavata:
     policies:
       - alias: Threats
         id: 00000000-0000-0000-0000-000000000000
       - alias: Toxicity
         id: 00000000-0000-0000-0000-000000000000
     # With colang 1.0, we can't pass parameters to flows, so we need to specify the policy to use
     # in the input and output rails.
     input:
       policy: Threats
     output:
       policy: Toxicity
       # You can specify labels to match against as part of the input/output rail configuration
       labels:
         - Hate Speech
         - Self-Harm
 #input:
   #flows:
     #- clavata check input
 #output:
   #flows:
     #- clavata check output

and its rails.co

import guardrails
import nemoguardrails.library

flow input rails $input_text
   clavata check input
flow output rails $output_text
   clavata check output

And
also a main.co file is required:

import core
import llm

flow main
 activate llm continuation
 

As you noted Colang 2.0 accepts argument so you can improve on this default, but having two different experiences for Colang 1 and 2 is something we try to avoid.

I think there are some further requests that I will share in another round of review.

@Pouyanpi Pouyanpi removed this from the v0.13.0 milestone Mar 21, 2025
@ilias-t ilias-t requested a review from Pouyanpi March 24, 2025 19:33
- Separating examples for 1.0 and 2.x colang
- Consolidating action so there's only 1 "action" function for both 1.0 and 2.x (with optional parameters-the action will figure out which approach to use based on what is passed to it).
- Removed the extra action that was an example of returning something other than an boolean.
@boichee
Copy link
Contributor
boichee commented Mar 24, 2025

@Pouyanpi I believe we've now addressed your comments. In the most recent updates, we:

  • Consolidated down to a single @action that can handle both the 1.0 and 2.x forms of integration. Basically, it just looks at whether the rail parameter was passed in, and if so, assumes we're doing 1.0, otherwise 2.x. I'm assuming this is how most people handle this.
  • Similarly, since the text to be evaluated appears to be able to be passed in in 2.x consolidated the flow for 2.x down to a single flow where the text is a parameter as well.
  • Removed that extra action that was seeking to return labels rather than a boolean result. Sounded like we'd have to get more in depth there and since we could always update the integration later, I think we'll start with the basics and see how that goes.
  • Split the examples into clavata and clavata_v2 folders as requested.

The only thing I wasn't sure about was whether the @action decorator should have execute_async=True as I believe that's a colang 2.x feature. Left it out for now, but happy to add back if needed.

Thanks!

Copy link
Collaborator
@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes.

Test Coverage:

Here are the test coverage report. It'd be great to add more tests (more details in comments)

---------- coverage: platform darwin, python 3.12.2-final-0 ----------
Name                                         Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------
nemoguardrails/library/clavata/__init__.py       0      0   100%
nemoguardrails/library/clavata/actions.py      126    126     0%   18-327
nemoguardrails/library/clavata/errs.py           6      0   100%
nemoguardrails/library/clavata/request.py       76     10    87%   54, 144, 171, 176, 183-184, 192-198
nemoguardrails/library/clavata/utils.py         55     28    49%   20-23, 56, 61-67, 79-104
--------------------------------------------------------------------------
TOTAL                                          263    164    38%

Colang 2.0 implementation

Colang 2.0 flow definitions are syntactically wrong. Please make sure that you test both Colang 1.0 and Colang 2.0 versions of the configs:
Please try:

nemoguardrails chat --config="./examples/configs/clavata_v2"

And resolve the errors.

Migration Compatibility Issue:

When migrating from Colang 1.0 to 2.0, there are two critical differences in how Clavata handles policies and labels (it is an important scenario as a user might migrate from Colang 1.0 to 2.0 at some point in future):

  1. Policy References:

    • Colang 1.0: Policies are referenced in the rail configuration (input/output)
    • Colang 2.0: Policies are referenced directly in flow definitions
  2. Label Handling:

    • Colang 1.0: Labels are defined in the rail configuration
    • Colang 2.0: Labels are expected in flow definitions
      This means existing label configurations will be ignored during migration.

These differences could lead to unexpected behavior where content moderation rules might not be properly transferred during migration. So I think the implementation for Colang 2.0 should still consider lables/policies define in config.yml files but prefers those when passed explicitly in colang. Or you might decide to deprecate that and should warn the user about this inconsistency and how they should migrate.

Thansk!

boichee added 2 commits April 13, 2025 14:19
Pulling in latest changes from develop.
- Added test coverage for all pydantic models used by the integration
- Added test coverage for the exp backoff decorator and the calculation of next retry time.
- Updated the example rails configuration for Colang v2.x so it parses correctly on startup with `nemoguardrails chat`
- Changed policy alias topology to use a dict to prevent accidental re-use of the same alias with a different ID. Updated example configs and documentation to show this difference. Notebook updated as well.
- Consolidated code for the Clavata check action: It now uses two helpers to determine the correct policy ID and labels to use. Precedence is given to policy/label that are passed directly to the action, but if values are not passed, the config will then be checked to obtain the correct values. This should make migration from colang 1 to v2.x smoother.
@ilias-t ilias-t requested a review from Pouyanpi April 15, 2025 19:49
Copy link
Collaborator
@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

Hi @boichee, could you please:

  1. Install and run the pre‑commit hooks on all files:
poetry run pre-commit install
poetry run pre-commit run --all-files
  1. Rebase onto the develop branch and resolve the merge conflict in config.py.
  2. You’ll then see some failing tests becase of HttpUrl (those are easy to fix). (I don’t have push access to your fork, otherwise I’d handle it for you.) Probably the easiest is not using HttpUrl type and use str instead.

Once those steps are done, everything should pass and we’ll be ready to merge. Thanks!

@Pouyanpi Pouyanpi added this to the v0.14.0 milestone Apr 17, 2025
@Pouyanpi Pouyanpi self-requested a review April 22, 2025 07:53
@Pouyanpi Pouyanpi merged commit 17f68cc into NVIDIA:develop Apr 24, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0