-
Notifications
You must be signed in to change notification settings - Fork 491
Feature/clavata integration #1025 #1027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/clavata integration #1025 #1027
Conversation
@Pouyanpi here are some of the questions we had for our discussion tomorrow. Questions for Nvidia TeamInterface questions
NeMo user questionsYou provided the PrivateAI plugin as an example but we're not sure it actually matches our technology and how it might be used. Here's why:
With the above in mind, we wanted to understand whether builders using NeMo are able to, and in turn, tend to make use of the plugins within their own flows. If so, this affects how we design the plugin interface. We've already included some of these ideas in our design. For example: You'll notice that in our config we've included the ability to assign human-readable aliases to the policy IDs. We did this with the idea that a NeMo builder might want to call our actions directly from their flows and provide a specific policy to use. The idea here is that its easier to supply the alias then to constantly have policy IDs floating around in their flows as the UUIDs are effectively magic numbers. Basically we're looking to understand how builders are making use of the guardrails plugins. Are they always activated and used as "rails" that run on every input and output? Or are they activated in specific cases? Our suspicion is that both are true and thus we should be supporting both use cases.
|
ae18325
to
465be03
Compare
- Users of the Clavata integration can now specify the exact labels that must match for the input/output to cause the rail to trigger and abort the flow. - Fixing some aspects of how the configuration is put together - Policy ID aliases make it easier to specify a policy by name instead of ID. - The new action `EvaluateUserInputWithClavataPolicy` allows you to evaluate the user input against a Clavata policy part of a flow that a user has written. - Added the ability for a user to specify ANY/ALL logic for label matches. Co-authored-by: Brett Levenson <brett@clavata.ai> Signed-off-by: Ilias Tsangaris <iliastsangaris@gmail.com>
602660a
to
8eec8a6
Compare
b2b35a6
to
7c30e9d
Compare
- v1 colang requires that the policies for input and output rails be specified in the config because parameters cannot be passed to flows - v2 colang allows parameters in flow definitions, so it should be possible to simply pass the policy and label arguments at the time the flow is defined. Also improved the Clavata Client to handle rate limit and exponential backoff situations.
7c30e9d
to
36c697a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a few comments to explain our reasoning and ask some questions on things I wasn't sure about (mostly related to how rails are meant to work in 2.x
and how variables are meant to be passed.)
- Fixed a small bug with UUID format when sent to server
868a24d
to
4aba727
Compare
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks great, we just need to make some adjustments.
This round of review is mostly Colang 2.0 related.
using the migration tool (see its usage):
Here is how your Colang 2.0 file should look like (flows.co
)
#### POLICY DETECTION RAILS ####
# INPUT RAIL
@active
flow clavata check input
"""Check if the user input is benign."""
$is_match = await ClavataCheckV1Action(rail="input", text=$user_message)
if $is_match
if $system.config.enable_rails_exceptions
global $msg
$msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
send ClavataPolicyMatchException(message=$msg)
else
bot refuse to respond
abort
# OUTPUT RAIL
@active
flow clavata check output
"""Check if the bot output is benign."""
$is_match = await ClavataCheckV1Action(rail="output", text=$bot_message)
if $is_match
if $system.config.enable_rails_exceptions
global $msg
$msg = "Interaction blocked by clavata check with policy={$policy} and text={$text}"
send ClavataPolicyMatchException(message=$msg)
else
bot refuse to respond
abort
For your example config in examples/configs/clavata
you need to have following config for Colang 2.0 (as mentioned in the comments you need to have two separate configurations thus directories for Colang 1 and 2)
colang_version: 2.x
# Example for colang 1.0
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
rails:
config:
clavata:
policies:
- alias: Threats
id: 00000000-0000-0000-0000-000000000000
- alias: Toxicity
id: 00000000-0000-0000-0000-000000000000
# With colang 1.0, we can't pass parameters to flows, so we need to specify the policy to use
# in the input and output rails.
input:
policy: Threats
output:
policy: Toxicity
# You can specify labels to match against as part of the input/output rail configuration
labels:
- Hate Speech
- Self-Harm
#input:
#flows:
#- clavata check input
#output:
#flows:
#- clavata check output
and its rails.co
import guardrails
import nemoguardrails.library
flow input rails $input_text
clavata check input
flow output rails $output_text
clavata check output
And
also a main.co file is required:
import core
import llm
flow main
activate llm continuation
As you noted Colang 2.0 accepts argument so you can improve on this default, but having two different experiences for Colang 1 and 2 is something we try to avoid.
I think there are some further requests that I will share in another round of review.
- Separating examples for 1.0 and 2.x colang - Consolidating action so there's only 1 "action" function for both 1.0 and 2.x (with optional parameters-the action will figure out which approach to use based on what is passed to it). - Removed the extra action that was an example of returning something other than an boolean.
@Pouyanpi I believe we've now addressed your comments. In the most recent updates, we:
The only thing I wasn't sure about was whether the Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes.
Test Coverage:
Here are the test coverage report. It'd be great to add more tests (more details in comments)
---------- coverage: platform darwin, python 3.12.2-final-0 ----------
Name Stmts Miss Cover Missing
--------------------------------------------------------------------------
nemoguardrails/library/clavata/__init__.py 0 0 100%
nemoguardrails/library/clavata/actions.py 126 126 0% 18-327
nemoguardrails/library/clavata/errs.py 6 0 100%
nemoguardrails/library/clavata/request.py 76 10 87% 54, 144, 171, 176, 183-184, 192-198
nemoguardrails/library/clavata/utils.py 55 28 49% 20-23, 56, 61-67, 79-104
--------------------------------------------------------------------------
TOTAL 263 164 38%
Colang 2.0 implementation
Colang 2.0 flow definitions are syntactically wrong. Please make sure that you test both Colang 1.0 and Colang 2.0 versions of the configs:
Please try:
nemoguardrails chat --config="./examples/configs/clavata_v2"
And resolve the errors.
Migration Compatibility Issue:
When migrating from Colang 1.0 to 2.0, there are two critical differences in how Clavata handles policies and labels (it is an important scenario as a user might migrate from Colang 1.0 to 2.0 at some point in future):
-
Policy References:
- Colang 1.0: Policies are referenced in the rail configuration (input/output)
- Colang 2.0: Policies are referenced directly in flow definitions
-
Label Handling:
- Colang 1.0: Labels are defined in the rail configuration
- Colang 2.0: Labels are expected in flow definitions
This means existing label configurations will be ignored during migration.
These differences could lead to unexpected behavior where content moderation rules might not be properly transferred during migration. So I think the implementation for Colang 2.0 should still consider lables/policies define in config.yml files but prefers those when passed explicitly in colang. Or you might decide to deprecate that and should warn the user about this inconsistency and how they should migrate.
Thansk!
Pulling in latest changes from develop.
- Added test coverage for all pydantic models used by the integration - Added test coverage for the exp backoff decorator and the calculation of next retry time. - Updated the example rails configuration for Colang v2.x so it parses correctly on startup with `nemoguardrails chat` - Changed policy alias topology to use a dict to prevent accidental re-use of the same alias with a different ID. Updated example configs and documentation to show this difference. Notebook updated as well. - Consolidated code for the Clavata check action: It now uses two helpers to determine the correct policy ID and labels to use. Precedence is given to policy/label that are passed directly to the action, but if values are not passed, the config will then be checked to obtain the correct values. This should make migration from colang 1 to v2.x smoother.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Hi @boichee, could you please:
- Install and run the pre‑commit hooks on all files:
poetry run pre-commit install
poetry run pre-commit run --all-files
- Rebase onto the develop branch and resolve the merge conflict in config.py.
- You’ll then see some failing tests becase of HttpUrl (those are easy to fix). (I don’t have push access to your fork, otherwise I’d handle it for you.) Probably the easiest is not using
HttpUrl
type and usestr
instead.
Once those steps are done, everything should pass and we’ll be ready to merge. Thanks!
Description
This PR introduces an integration for Clavata for customized LLM content moderation. It adds a
detect_policy_match
action which can be used in input and output flows.Co-authored-by: @boichee
Related Issue(s)
#1025
Checklist
Outstanding items