Skip to content

Prompt-image Alignment Metrics

CLIPImageQualityScoreMetric

Bases: BasePromptAlignmentMetric

CLIP Image Quality Assessment metric for to measuring the visual content of images.

The metric is based on the CLIP model, which is a neural network trained on a variety of (image, text) pairs to be able to generate a vector representation of the image and the text that is similar if the image and text are semantically similar.

The metric works by calculating the cosine similarity between user provided images and pre-defined prompts. The prompts always comes in pairs of “positive” and “negative” such as “Good photo.” and “Bad photo.”. By calculating the similartity between image embeddings and both the “positive” and “negative” prompt, the metric can determine which prompt the image is more similar to. The metric then returns the probability that the image is more similar to the first prompt than the second prompt.

Parameters:

Name Type Description Default
clip_model_name_or_path str

The name or path of the CLIP model to use. Defaults to "clip_iqa".

'clip_iqa'
name str

Name of the metric. Defaults to "clip_image_quality_assessment".

'clip_image_quality_assessment'
Source code in hemm/metrics/prompt_alignment/clip_iqa_score.py
class CLIPImageQualityScoreMetric(BasePromptAlignmentMetric):
    """[CLIP Image Quality Assessment](https://arxiv.org/abs/2207.12396) metric
    for to measuring the visual content of images.

    The metric is based on the [CLIP](https://arxiv.org/abs/2103.00020) model,
    which is a neural network trained on a variety of (image, text) pairs to be
    able to generate a vector representation of the image and the text that is
    similar if the image and text are semantically similar.

    The metric works by calculating the cosine similarity between user provided images
    and pre-defined prompts. The prompts always comes in pairs of “positive” and “negative”
    such as “Good photo.” and “Bad photo.”. By calculating the similartity between image
    embeddings and both the “positive” and “negative” prompt, the metric can determine which
    prompt the image is more similar to. The metric then returns the probability that the
    image is more similar to the first prompt than the second prompt.

    Args:
        clip_model_name_or_path (str, optional): The name or path of the CLIP model to use.
            Defaults to "clip_iqa".
        name (str, optional): Name of the metric. Defaults to "clip_image_quality_assessment".
    """

    def __init__(
        self,
        clip_model_name_or_path: str = "clip_iqa",
        name: str = "clip_image_quality_assessment",
    ) -> None:
        super().__init__(name)
        self.clip_iqa_fn = partial(
            clip_image_quality_assessment, model_name_or_path=clip_model_name_or_path
        )
        self.built_in_prompts = [
            "quality",
            "brightness",
            "noisiness",
            "colorfullness",
            "sharpness",
            "contrast",
            "complexity",
            "natural",
            "happy",
            "scary",
            "new",
            "real",
            "beautiful",
            "lonely",
            "relaxing",
        ]
        self.config = {"clip_model_name_or_path": clip_model_name_or_path}

    @weave.op()
    def compute_metric(
        self, pil_image: Image, prompt: str
    ) -> Union[float, Dict[str, float]]:
        images = np.expand_dims(np.array(pil_image), axis=0).astype(np.uint8) / 255.0
        score_dict = {}
        for prompt in tqdm(
            self.built_in_prompts, desc="Calculating IQA scores", leave=False
        ):
            clip_iqa_score = float(
                self.clip_iqa_fn(
                    images=torch.from_numpy(images).permute(0, 3, 1, 2),
                    prompts=tuple([prompt] * images.shape[0]),
                ).detach()
            )
            score_dict[f"{self.name}_{prompt}"] = clip_iqa_score
        return score_dict

    @weave.op()
    async def __call__(
        self, prompt: str, model_output: Dict[str, Any]
    ) -> Dict[str, float]:
        _ = "CLIPImageQualityScoreMetric"
        return super().__call__(prompt, model_output)

CLIPScoreMetric

Bases: BasePromptAlignmentMetric

CLIP score metric for text-to-image similarity. CLIP Score is a reference free metric that can be used to evaluate the correlation between a generated caption for an image and the actual content of the image. It has been found to be highly correlated with human judgement.

Parameters:

Name Type Description Default
name str

Name of the metric. Defaults to "clip_score".

'clip_score'
clip_model_name_or_path str

The name or path of the CLIP model to use. Defaults to "openai/clip-vit-base-patch16".

'openai/clip-vit-base-patch16'
Source code in hemm/metrics/prompt_alignment/clip_score.py
class CLIPScoreMetric(BasePromptAlignmentMetric):
    """[CLIP score](https://arxiv.org/abs/2104.08718) metric for text-to-image similarity.
    CLIP Score is a reference free metric that can be used to evaluate the correlation between
    a generated caption for an image and the actual content of the image. It has been found to
    be highly correlated with human judgement.

    Args:
        name (str, optional): Name of the metric. Defaults to "clip_score".
        clip_model_name_or_path (str, optional): The name or path of the CLIP model to use.
            Defaults to "openai/clip-vit-base-patch16".
    """

    def __init__(
        self,
        clip_model_name_or_path: str = "openai/clip-vit-base-patch16",
        name: str = "clip_score",
    ) -> None:
        super().__init__(name)
        self.clip_score_fn = partial(
            clip_score, model_name_or_path=clip_model_name_or_path
        )
        self.config = {"clip_model_name_or_path": clip_model_name_or_path}

    @weave.op()
    def compute_metric(
        self, pil_image: Image.Image, prompt: str
    ) -> Union[float, Dict[str, float]]:
        images = np.expand_dims(np.array(pil_image), axis=0)
        return float(
            self.clip_score_fn(
                torch.from_numpy(images).permute(0, 3, 1, 2), prompt
            ).detach()
        )

    @weave.op()
    async def __call__(
        self, prompt: str, model_output: Dict[str, Any]
    ) -> Dict[str, float]:
        _ = "CLIPScoreMetric"
        return super().__call__(prompt, model_output)

BasePromptAlignmentMetric

Bases: ABC

Base class for Prompt Alignment Metrics.

Parameters:

Name Type Description Default
name str

Name of the metric.

required
Source code in hemm/metrics/prompt_alignment/base.py
class BasePromptAlignmentMetric(ABC):
    """Base class for Prompt Alignment Metrics.

    Args:
        name (str): Name of the metric.
    """

    def __init__(self, name: str) -> None:
        super().__init__()
        self.scores = []
        self.name = name
        self.config = {}

    @abstractmethod
    def compute_metric(
        self, pil_image: Image.Image, prompt: str
    ) -> Union[float, Dict[str, float]]:
        """Compute the metric for the given image. This is an abstract
        method and must be overriden by the child class implementation.

        Args:
            pil_image (Image.Image): Image in PIL format.
            prompt (str): Prompt for the image generation.

        Returns:
            Union[float, Dict[str, float]]: Metric score.
        """
        pass

    def __call__(self, prompt: str, model_output: Dict[str, Any]) -> Dict[str, float]:
        """Compute the metric for the given image. This method is used as the scorer
        function for `weave.Evaluation` in the evaluation pipelines.

        Args:
            prompt (str): Prompt for the image generation.
            model_output (Dict[str, Any]): Model output containing the generated image.

        Returns:
            Dict[str, float]: Metric score.
        """
        pil_image = Image.open(
            BytesIO(base64.b64decode(model_output["image"].split(";base64,")[-1]))
        )
        score = self.compute_metric(pil_image, prompt)
        self.scores.append(score)
        return {self.name: score}

__call__(prompt, model_output)

Compute the metric for the given image. This method is used as the scorer function for weave.Evaluation in the evaluation pipelines.

Parameters:

Name Type Description Default
prompt str

Prompt for the image generation.

required
model_output Dict[str, Any]

Model output containing the generated image.

required

Returns:

Type Description
Dict[str, float]

Dict[str, float]: Metric score.

Source code in hemm/metrics/prompt_alignment/base.py
def __call__(self, prompt: str, model_output: Dict[str, Any]) -> Dict[str, float]:
    """Compute the metric for the given image. This method is used as the scorer
    function for `weave.Evaluation` in the evaluation pipelines.

    Args:
        prompt (str): Prompt for the image generation.
        model_output (Dict[str, Any]): Model output containing the generated image.

    Returns:
        Dict[str, float]: Metric score.
    """
    pil_image = Image.open(
        BytesIO(base64.b64decode(model_output["image"].split(";base64,")[-1]))
    )
    score = self.compute_metric(pil_image, prompt)
    self.scores.append(score)
    return {self.name: score}

compute_metric(pil_image, prompt) abstractmethod

Compute the metric for the given image. This is an abstract method and must be overriden by the child class implementation.

Parameters:

Name Type Description Default
pil_image Image

Image in PIL format.

required
prompt str

Prompt for the image generation.

required

Returns:

Type Description
Union[float, Dict[str, float]]

Union[float, Dict[str, float]]: Metric score.

Source code in hemm/metrics/prompt_alignment/base.py
@abstractmethod
def compute_metric(
    self, pil_image: Image.Image, prompt: str
) -> Union[float, Dict[str, float]]:
    """Compute the metric for the given image. This is an abstract
    method and must be overriden by the child class implementation.

    Args:
        pil_image (Image.Image): Image in PIL format.
        prompt (str): Prompt for the image generation.

    Returns:
        Union[float, Dict[str, float]]: Metric score.
    """
    pass