Description
Description
One of the most unique aspects of agentic development is its non-determinism. When an agent calls an LLM, by definition it will receive an answer which is relevant to the context + prompt, but depending on the parameters and inputs, that answer may not be exactly the same. For this reason and many others it's very important that kagent
enable powerful new feedback and testing strategies for its agents.
Features
Debugging/time-travel
As mentioned earlier, LLM calls being non-deterministic means it's very very important to be able to try individual calls with different inputs, to test the effect on the output. Agentic systems can have many LLM calls in them for any given request, so it's very important that users be able to test individual calls in that stack, or even be able to replay the stack with different inputs for a given step.
One of the reasons autogen
was such an obvious choice for us is that it has many of the building blocks for such a feature, specifically a declarative API and state management.
Evals
Reasoning about the correctness of an agent solutions/responses is much more complex than traditional systems for the reasons mentioned above. Doing so often requires the use of additional agents geared towards these specific use-cases. Given that we are recording all of the information anyway regarding LLM calls, tools, etc, we should make it easier to evaluate the success of the agents and calls.
This may either take the form of building something on top of existing building blocks, or integrating with existing solutions, more research is required.
Guided Learning
Guided learning will require a robust eval framework and state management, which is why it appears after it in the list.
One of the most common ways to improve LLM reliability is via prompt engineering and context. Given that the system will have the necessary building blocks for saving/evaluating previous prompts, we could feed this data into future LLM calls to improve the accuracy, thereby allowing the system to "learn" the more it's used.