8000 [OPIK-1834] [P SDK] ConversationalCoherence conversation metric by yaricom · Pull Request #2579 · comet-ml/opik · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[OPIK-1834] [P SDK] ConversationalCoherence conversation metric #2579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: OPIK-1833-threads-evaluation-engine
Choose a base branch
from

Conversation

yaricom
Copy link
Member
@yaricom yaricom commented Jun 25, 2025

Details

Conversational Coherence Score: whether the conversation session felt like a natural, adaptive, helpful interaction.

The metric calculated as following:

$$coherenceScore = \frac{relevantTurns}{totalTurns}$$

The ConversationalCoherenceMetric first constructs a sliding windows of turns for each turn, before using an LLM to determine whether the last turn in each sliding window has an "assistant" content that is relevant to the previous conversational context found in the sliding window

Testing

Added related unit tests

Documentation

Added related docstrings

yaricom added 2 commits June 25, 2025 18:57
…entation.

- Introduced `ConversationalCoherenceMetric` for evaluating coherence of conversation exchanges within sliding windows.
- Implemented templates and schema for constructing evaluation queries and parsing responses.
- Added helper methods to generate sliding windows of conversation turns.
- Included Pydantic models for validating and handling evaluation responses.
- Integrated scoring calculation and irrelevancy extraction logic for detailed analysis.
Copy link
Contributor
github-actions bot commented Jun 25, 2025

SDK Unit Tests Results

636 tests   635 ✅  27s ⏱️
  1 suites    0 💤
  1 files      0 ❌  1 🔥

For more details on these errors, see this check.

Results for commit bf0da27.

♻️ This comment has been updated with latest results.

< 6104 a href="/comet-ml/opik/pull/2579/commits/bf0da27f82f0bcd5668a73e48a385258254217af" class="Link--secondary">bf0da27
…ss metrics.

- Added `ConversationTurn` class for structured representation of conversation exchanges.
- Implemented `build_conversation_turns` and `merge_turns` utilities for processing conversational data.
- Refactored window-based scoring logic to use conversation turns, improving clarity and modularity.
- Introduced comprehensive tests for conversational coherence and session completeness metrics, covering error scenarios and ensuring robustness.
- Improved error handling and logging across metric computation functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0