Utils
EvaluationCallManager
Manages the evaluation calls for a specific project and entity in Weave.
This class is responsible for initializing and managing evaluation calls associated with a specific project and entity. It provides functionality to collect guardrail guard calls from evaluation predictions and scores, and render these calls into a structured format suitable for display in Streamlit.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
str
|
The entity name. |
required |
project
|
str
|
The project name. |
required |
call_id
|
str
|
The call id. |
required |
max_count
|
int
|
The maximum number of guardrail guard calls to collect from the evaluation. |
10
|
Source code in guardrails_genie/utils.py
collect_guardrail_guard_calls_from_eval()
Collects guardrail guard calls from evaluation predictions and scores.
This function iterates through the children calls of the base evaluation call, extracting relevant guardrail guard calls and their associated scores. It stops collecting calls if it encounters an "Evaluation.summarize" operation or if the maximum count of guardrail guard calls is reached. The collected calls are stored in a list of dictionaries, each containing the input prompt, outputs, and score.
Returns:
Name | Type | Description |
---|---|---|
list |
A list of dictionaries, each containing: - input_prompt (str): The input prompt for the guard call. - outputs (dict): The outputs of the guard call. - score (dict): The score of the guard call. |
Source code in guardrails_genie/utils.py
render_calls_to_streamlit()
Renders the collected guardrail guard calls into a pandas DataFrame suitable for display in Streamlit.
This function processes the collected guardrail guard calls stored in self.call_list
and
organizes them into a dictionary format that can be easily converted into a pandas DataFrame.
The DataFrame contains columns for the input prompts, the safety status of the outputs, and
the correctness of the predictions for each guardrail.
The structure of the DataFrame is as follows: - The first column contains the input prompts. - Subsequent columns contain the safety status and prediction correctness for each guardrail.
Returns:
Type | Description |
---|---|
pd.DataFrame: A DataFrame containing the input prompts, safety status, and prediction correctness for each guardrail. |