Creating a LabelGenerator

Annotation tasks are often more complex than simply making each annotator assign a single label to each data point. When this is the case, it is necessary to create a custom class that inherits from LabelGenerator to transform the raw annotations into numeric encodings that can be used to compute agreement.

Let’s give an example. As before, we have multiple annotators per example, but we’ve also instructed them to provide a subjective confidence score regarding their annotations. The confidence score is a Likert scale from 1-5, where 5 means “absolutely certain”.

import numpy as np
import pandas as pd

annotations = pd.DataFrame(
   {"Larry_label":      ["yes", "no", "no", "yes", "yes"],
    "Larry_confidence": [4, 2, 5, 5, 3],
    "Curly_label":      ["no", "no", "yes", "no", "yes"],
    "Curly_confidence": [5, 5, 4, 2, 3],
    "Moe_label":        ["yes", "yes", "yes", "no", "yes"],
    "Moe_confidence":   [2, 5, 3, 4, 3]}
 )
print(annotations)

  Larry_label  Larry_confidence Curly_label  Curly_confidence Moe_label  Moe_confidence
0         yes                 4          no                 5       yes               2
1          no                 2          no                 5       yes               5
2          no                 5         yes                 4       yes               3
3         yes                 5          no                 2        no               4
4         yes                 3         yes                 3       yes               3

We have to determine three things:

  1. How to combine a single annotator’s label and confidence score into a probability.

  2. How to combine the individual annotator’s probabilities into a single probability for a sample.

  3. Same as 2, but determine a single “hard” label as consensus.

For item 1, let’s make it so a confidence of 5 doesn’t change the label at all, while a confidence of 1 results in a uniform distribution. The other confidence levels will fall between these.

For item 2, we’ll average the probabilites from item 1 across annotators. For item 3, we’ll just take the argmax of 2.

These are implemented below as the add_annotation_prob_labels, add_sample_prob_labels, and add_sample_hard_labels methods of our custom ConfidenceLabelGenerator.

confidence_label_generator.py

import numpy as np
import pandas as pd
from effiara.label_generators import LabelGenerator


class ConfidenceLabelGenerator(LabelGenerator):

    def _to_onehot(self, lab):
        # self.label_mapping is an argument to __init__
        onehot = np.zeros(len(self.label_mapping))
        idx = self.label_mapping[lab]
        onehot[idx] = 1.
        return onehot

    def add_annotation_prob_labels(self, df):
        """
        Convert the annotation of a single annotator on a single
        example into a probability distribution over classes.
        """
        conf_scale = np.arange(1, 6)
        # weights are evenly spaced between [0.5 - 0.0]
        weights = np.linspace(0.5, 1.0, 5)
        conf2weight = dict(zip(conf_scale, weights))

        def add_prob_label_to_row(row):
            # self.annotators is an argument to __init__
            for annotator in self.annotators:
                y = self._to_onehot(row[f"{annotator}_label"])
                conf = row[f"{annotator}_confidence"]
                weight = conf2weight[conf]
                row[f"{annotator}_prob"] = np.abs(y - weight)
            return row

        return df.apply(add_prob_label_to_row, axis=1)

    def add_sample_prob_labels(self, df, reliability_dict=None):
        """
        Aggregate annotations from multiple annotators into a single
        probability distribution.
        """
        prob_label_cols = [c for c in df.columns if c.endswith("_prob")]
        if len(prob_label_cols) == 0:
            df = self.add_annotation_prob_labels(df)

        def compute_avg_prob_label(row):
            row["consensus_prob"] = np.mean(row[prob_label_cols])
            return row

        return df.apply(compute_avg_prob_label, axis=1)


    def add_sample_hard_labels(self, df):
        """
        Aggregate annotations from multiple annotators into a single
        'hard' onehot encoding.
        """
        if "consensus_prob" not in df.columns:
            df = self.add_sample_prob_labels(df)

        dfcp = df.copy()
        onehots = np.zeros((len(df), len(self.label_mapping)))
        idxs = df["consensus_prob"].apply(np.argmax)
        onehots[np.arange(len(df)), idxs] = 1
        dfcp["consensus_hard"] = list(onehots)
        return dfcp

Once all this is done, we can pass an instance of our ConfidenceLabelGenerator to the Annotations class to use the new labels when computing agreement.

from effiara.annotator_reliability import Annotations
from .confidence_label_generator import ConfidenceLabelGenerator  # The class we defined above.

label_mapping = {"no": 0, "yes": 1}
annotators = ["Larry", "Curly", "Moe"]
label_generator = ConfidenceLabelGenerator(annotators, label_mapping)
annos = Annotations(annotations,  # we defined this above.
                    label_generator=label_generator,
                    agreement_metric="cosine",
                    agreement_suffix="_prob")  # Compute agreement from these columns
print(annos)