Feature/clavata integration #1025 #1027

ilias-t · 2025-03-05T05:15:09Z

Description

This PR introduces an integration for Clavata for customized LLM content moderation. It adds a detect_policy_match action which can be used in input and output flows.

Co-authored-by: @boichee

Related Issue(s)

#1025

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes. @Pouyanpi

Outstanding items

Sign commits
Update commits to adhere to DCO
Add Jupyter Notebook example
Address PR feedback

boichee · 2025-03-05T06:45:09Z

@Pouyanpi here are some of the questions we had for our discussion tomorrow.

Questions for Nvidia Team

Interface questions

What does is_system_action do and should we be using it?
Is there a reason why we'd pass the $bot_message or $user_message to the action rather than just extracting them from the context?
It seems like we should be using execute_async based on the docs, but none of the other plugins seem to do so, what do you recommend here?
What are the @active decorators for in the colang flows? Based on the docs this means the author of the bot (user of NeMO) does not have to explicitly activate the flow in their flow. To me, this comes back to whether users want to use our plugin as a "rail" (see below) or use it more explicitly where and when they choose.

NeMo user questions

You provided the PrivateAI plugin as an example but we're not sure it actually matches our technology and how it might be used. Here's why:

PrivateAI is used to strip PII from user and bot messages, so it makes sense that a bot builder would want to have this plugin run on EVERY user and bot message. And this is how we interpret the case where a builder specifies an input and output flow in their config.
In contrast, with Clavata.ai Policies, there are likely use cases where a builder would not want a policy to run on EVERY user and bot message. For example:
- A bot that is capable of issuing refunds to customers: For this use case, our customers would likely want to write a policy that specifically defines the rules for refunds. And this policy wouldn't be relevant to every interaction between the user and the bot. It would likely only be relevant when the user asks about a refund.

With the above in mind, we wanted to understand whether builders using NeMo are able to, and in turn, tend to make use of the plugins within their own flows. If so, this affects how we design the plugin interface. We've already included some of these ideas in our design. For example: You'll notice that in our config we've included the ability to assign human-readable aliases to the policy IDs. We did this with the idea that a NeMo builder might want to call our actions directly from their flows and provide a specific policy to use. The idea here is that its easier to supply the alias then to constantly have policy IDs floating around in their flows as the UUIDs are effectively magic numbers.

Basically we're looking to understand how builders are making use of the guardrails plugins. Are they always activated and used as "rails" that run on every input and output? Or are they activated in specific cases? Our suspicion is that both are true and thus we should be supporting both use cases.

`relevant_chunks` usage

Our system would definitely benefit from additional context when evaluating user/bot messages, but based on the docs we weren't completely clear what the type of relevant_chunks will be and what information will be included in it.

Ideally, relevant_chunks always includes a set of prior user/bot messages that are relevant to the current message. If this is the case, it would make sense for us to provide these chunks to our API as additional context to help it evaluate the current message in the context of the larger interaction thus far.

This, again, goes back to whether we should be using context to extract user and bot messages or if we should simply be passing the $user_message and $bot_message to the action directly as PrivateAI seems to do.

docs/user-guides/community/clavata.md

nemoguardrails/library/clavata/actions.py

nemoguardrails/library/clavata/request.py

- Users of the Clavata integration can now specify the exact labels that must match for the input/output to cause the rail to trigger and abort the flow. - Fixing some aspects of how the configuration is put together - Policy ID aliases make it easier to specify a policy by name instead of ID. - The new action `EvaluateUserInputWithClavataPolicy` allows you to evaluate the user input against a Clavata policy part of a flow that a user has written. - Added the ability for a user to specify ANY/ALL logic for label matches. Co-authored-by: Brett Levenson <brett@clavata.ai> Signed-off-by: Ilias Tsangaris <iliastsangaris@gmail.com>

- v1 colang requires that the policies for input and output rails be specified in the config because parameters cannot be passed to flows - v2 colang allows parameters in flow definitions, so it should be possible to simply pass the policy and label arguments at the time the flow is defined. Also improved the Clavata Client to handle rate limit and exponential backoff situations.

boichee

Adding a few comments to explain our reasoning and ask some questions on things I wasn't sure about (mostly related to how rails are meant to work in 2.x and how variables are meant to be passed.)

docs/user-guides/community/clavata.md

examples/configs/clavata/rails.co

nemoguardrails/library/clavata/actions.py

nemoguardrails/rails/llm/config.py

nemoguardrails/library/clavata/utils.py

- Fixed a small bug with UUID format when sent to server

github-actions · 2025-03-17T11:29:32Z

Documentation preview

https://nvidia.github.io/NeMo-Guardrails/review/pr-1027

docs/user-guides/community/clavata.md

examples/configs/clavata/config.yml

Pouyanpi

Thank you! This looks great, we just need to make some adjustments.

This round of review is mostly Colang 2.0 related.

using the migration tool (see its usage):

Here is how your Colang 2.0 file should look like (flows.co)

#### POLICY DETECTION RAILS ####

# INPUT RAIL
@active
flow clavata check input
  """Check if the user input is benign."""
  $is_match = await ClavataCheckV1Action(rail="input", text=$user_message)

  if $is_match
    if $system.config.enable_rails_exceptions
      global $msg
      $msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
      send ClavataPolicyMatchException(message=$msg)
    else
      bot refuse to respond
    abort


# OUTPUT RAIL
@active
flow clavata check output
  """Check if the bot output is benign."""
  $is_match = await ClavataCheckV1Action(rail="output", text=$bot_message)

  if $is_match
    if $system.config.enable_rails_exceptions
      global $msg
      $msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
      send ClavataPolicyMatchException(message=$msg)
    else
      bot refuse to respond
    abort

For your example config in examples/configs/clavata you need to have following config for Colang 2.0 (as mentioned in the comments you need to have two separate configurations thus directories for Colang 1 and 2)

colang_version: 2.x
# Example for colang 1.0
models:
 - type: main
   engine: openai
   model: gpt-3.5-turbo-instruct

rails:
 config:
   clavata:
     policies:
       - alias: Threats
         id: 00000000-0000-0000-0000-000000000000
       - alias: Toxicity
         id: 00000000-0000-0000-0000-000000000000
     # With colang 1.0, we can't pass parameters to flows, so we need to specify the policy to use
     # in the input and output rails.
     input:
       policy: Threats
     output:
       policy: Toxicity
       # You can specify labels to match against as part of the input/output rail configuration
       labels:
         - Hate Speech
         - Self-Harm
 #input:
   #flows:
     #- clavata check input
 #output:
   #flows:
     #- clavata check output

and its rails.co

import guardrails
import nemoguardrails.library

flow input rails $input_text
   clavata check input
flow output rails $output_text
   clavata check output

And
also a main.co file is required:

import core
import llm

flow main
 activate llm continuation

As you noted Colang 2.0 accepts argument so you can improve on this default, but having two different experiences for Colang 1 and 2 is something we try to avoid.

I think there are some further requests that I will share in another round of review.

- Separating examples for 1.0 and 2.x colang - Consolidating action so there's only 1 "action" function for both 1.0 and 2.x (with optional parameters-the action will figure out which approach to use based on what is passed to it). - Removed the extra action that was an example of returning something other than an boolean.

boichee · 2025-03-24T21:10:27Z

@Pouyanpi I believe we've now addressed your comments. In the most recent updates, we:

Consolidated down to a single @action that can handle both the 1.0 and 2.x forms of integration. Basically, it just looks at whether the rail parameter was passed in, and if so, assumes we're doing 1.0, otherwise 2.x. I'm assuming this is how most people handle this.
Similarly, since the text to be evaluated appears to be able to be passed in in 2.x consolidated the flow for 2.x down to a single flow where the text is a parameter as well.
Removed that extra action that was seeking to return labels rather than a boolean result. Sounded like we'd have to get more in depth there and since we could always update the integration later, I think we'll start with the basics and see how that goes.
Split the examples into clavata and clavata_v2 folders as requested.

The only thing I wasn't sure about was whether the @action decorator should have execute_async=True as I believe that's a colang 2.x feature. Left it out for now, but happy to add back if needed.

Thanks!

Pouyanpi

Thanks for making the changes.

Test Coverage:

Here are the test coverage report. It'd be great to add more tests (more details in comments)

---------- coverage: platform darwin, python 3.12.2-final-0 ----------
Name                                         Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------
nemoguardrails/library/clavata/__init__.py       0      0   100%
nemoguardrails/library/clavata/actions.py      126    126     0%   18-327
nemoguardrails/library/clavata/errs.py           6      0   100%
nemoguardrails/library/clavata/request.py       76     10    87%   54, 144, 171, 176, 183-184, 192-198
nemoguardrails/library/clavata/utils.py         55     28    49%   20-23, 56, 61-67, 79-104
--------------------------------------------------------------------------
TOTAL                                          263    164    38%

Colang 2.0 implementation

Colang 2.0 flow definitions are syntactically wrong. Please make sure that you test both Colang 1.0 and Colang 2.0 versions of the configs:
Please try:

nemoguardrails chat --config="./examples/configs/clavata_v2"

And resolve the errors.

Migration Compatibility Issue:

When migrating from Colang 1.0 to 2.0, there are two critical differences in how Clavata handles policies and labels (it is an important scenario as a user might migrate from Colang 1.0 to 2.0 at some point in future):

Policy References:
- Colang 1.0: Policies are referenced in the rail configuration (input/output)
- Colang 2.0: Policies are referenced directly in flow definitions
Label Handling:
- Colang 1.0: Labels are defined in the rail configuration
- Colang 2.0: Labels are expected in flow definitions
  This means existing label configurations will be ignored during migration.

These differences could lead to unexpected behavior where content moderation rules might not be properly transferred during migration. So I think the implementation for Colang 2.0 should still consider lables/policies define in config.yml files but prefers those when passed explicitly in colang. Or you might decide to deprecate that and should warn the user about this inconsistency and how they should migrate.

Thansk!

nemoguardrails/library/clavata/actions.py

nemoguardrails/library/clavata/utils.py

examples/configs/clavata_v2/rails.co

Pulling in latest changes from develop.

- Added test coverage for all pydantic models used by the integration - Added test coverage for the exp backoff decorator and the calculation of next retry time. - Updated the example rails configuration for Colang v2.x so it parses correctly on startup with `nemoguardrails chat` - Changed policy alias topology to use a dict to prevent accidental re-use of the same alias with a different ID. Updated example configs and documentation to show this difference. Notebook updated as well. - Consolidated code for the Clavata check action: It now uses two helpers to determine the correct policy ID and labels to use. Precedence is given to policy/label that are passed directly to the action, but if values are not passed, the config will then be checked to obtain the correct values. This should make migration from colang 1 to v2.x smoother.

Pouyanpi

LGTM! Thank you!

Hi @boichee, could you please:

Install and run the pre‑commit hooks on all files:

poetry run pre-commit install
poetry run pre-commit run --all-files

Rebase onto the develop branch and resolve the merge conflict in config.py.
You’ll then see some failing tests becase of HttpUrl (those are easy to fix). (I don’t have push access to your fork, otherwise I’d handle it for you.) Probably the easiest is not using HttpUrl type and use str instead.

Once those steps are done, everything should pass and we’ll be ready to merge. Thanks!

…integration

nemoguardrails/library/clavata/flows.co

ilias-t marked this pull request as draft March 5, 2025 05:17

boichee reviewed Mar 5, 2025

View reviewed changes

docs/user-guides/community/clavata.md Outdated Show resolved Hide resolved

boichee reviewed Mar 5, 2025

View reviewed changes

nemoguardrails/library/clavata/actions.py Outdated Show resolved Hide resolved

boichee reviewed Mar 5, 2025

View reviewed changes

nemoguardrails/library/clavata/request.py Outdated Show resolved Hide resolved

cparisien requested a review from Pouyanpi March 5, 2025 14:08

ilias-t force-pushed the feature/clavata-integration branch 2 times, most recently from ae18325 to 465be03 Compare March 5, 2025 21:55

ilias-t and others added 2 commits March 6, 2025 13:05

Add tests

8eec8a6

ilias-t force-pushed the feature/clavata-integration branch from 602660a to 8eec8a6 Compare March 6, 2025 21:05

ilias-t marked this pull request as ready for review March 7, 2025 23:49

boichee force-pushed the feature/clavata-integration branch 2 times, most recently from b2b35a6 to 7c30e9d Compare March 9, 2025 01:17

boichee force-pushed the feature/clavata-integration branch from 7c30e9d to 36c697a Compare March 9, 2025 01:20

boichee reviewed Mar 9, 2025

View reviewed changes

Adding notebook demostrating use of Clavata.ai wtth Guardrails

4aba727

- Fixed a small bug with UUID format when sent to server

boichee force-pushed the feature/clavata-integration branch from 868a24d to 4aba727 Compare March 9, 2025 02:01

Pouyanpi removed their request for review March 17, 2025 11:29

Pouyanpi self-assigned this Mar 21, 2025

Pouyanpi added the status: in review label Mar 21, 2025

Pouyanpi added this to the v0.13.0 milestone Mar 21, 2025

Pouyanpi reviewed Mar 21, 2025

View reviewed changes

docs/user-guides/community/clavata.md Outdated Show resolved Hide resolved

Pouyanpi reviewed Mar 21, 2025

View reviewed changes

examples/configs/clavata/config.yml Outdated Show resolved Hide resolved

Pouyanpi reviewed Mar 21, 2025

View reviewed changes

Pouyanpi removed this from the v0.13.0 milestone Mar 21, 2025

PR feedback

646770b

ilias-t requested a review from Pouyanpi March 24, 2025 19:33

ilias-t added 2 commits March 26, 2025 12:14

Merge branch 'develop' into feature/clavata-integration

ccb4f32

Merge branch 'develop' into feature/clavata-integration

41b6fdc

Pouyanpi requested changes Mar 31, 2025

View reviewed changes

nemoguardrails/library/clavata/actions.py Outdated Show resolved Hide resolved

nemoguardrails/library/clavata/actions.py Show resolved Hide resolved

nemoguardrails/library/clavata/utils.py Show resolved Hide resolved

examples/configs/clavata_v2/rails.co Outdated Show resolved Hide resolved

boichee added 2 commits April 13, 2025 14:19

Merge branch 'develop' into feature/clavata-integration

3cd57de

Pulling in latest changes from develop.

ilias-t requested a review from Pouyanpi April 15, 2025 19:49

Pouyanpi approved these changes Apr 17, 2025

View reviewed changes

ilias-t added 3 commits April 17, 2025 15:59

run pre-commit

58cea47

Merge remote-tracking branch 'upstream/develop' into feature/clavata-…

baa1b50

…integration

use str in favor of httpurl

4cad9cc

Pouyanpi added this to the v0.14.0 milestone Apr 17, 2025

ilias-t added 2 commits April 17, 2025 18:13

fix issue with dashes getting stripped from policy ids

4d482ec

python 3.9 compat

fe0df6c

Pouyanpi self-requested a review April 22, 2025 07:53

Use union operator rather than pipe for python 3.9

809c8eb

Pouyanpi approved these changes Apr 23, 2025

View reviewed changes

Pouyanpi reviewed Apr 24, 2025

View reviewed changes

nemoguardrails/library/clavata/flows.co Show resolved Hide resolved

Pouyanpi merged commit 17f68cc into NVIDIA:develop Apr 24, 2025
8 checks passed

Pouyanpi removed the status: in review label Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/clavata integration #1025 #1027

Feature/clavata integration #1025 #1027

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feature/clavata integration #1025 #1027

Feature/clavata integration #1025 #1027

Uh oh!

Conversation

Uh oh!

Description

Related Issue(s)

Checklist

Outstanding items

Uh oh!

Questions for Nvidia Team

Interface questions

NeMo user questions

relevant_chunks usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Documentation preview

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Test Coverage:

Colang 2.0 implementation

Migration Compatibility Issue:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`relevant_chunks` usage