Creating a LabelGenerator
Annotation tasks are often more complex than simply making each annotator assign a single
label to each data point. When this is the case, it is necessary to create a custom
class that inherits from LabelGenerator to transform the raw annotations into numeric encodings
that can be used to compute agreement.
Let’s give an example. As before, we have multiple annotators per example, but we’ve also instructed them to provide a subjective confidence score regarding their annotations. The confidence score is a Likert scale from 1-5, where 5 means “absolutely certain”.
import numpy as np
import pandas as pd
annotations = pd.DataFrame(
{"Larry_label": ["yes", "no", "no", "yes", "yes"],
"Larry_confidence": [4, 2, 5, 5, 3],
"Curly_label": ["no", "no", "yes", "no", "yes"],
"Curly_confidence": [5, 5, 4, 2, 3],
"Moe_label": ["yes", "yes", "yes", "no", "yes"],
"Moe_confidence": [2, 5, 3, 4, 3]}
)
print(annotations)
Larry_label Larry_confidence Curly_label Curly_confidence Moe_label Moe_confidence
0 yes 4 no 5 yes 2
1 no 2 no 5 yes 5
2 no 5 yes 4 yes 3
3 yes 5 no 2 no 4
4 yes 3 yes 3 yes 3
We have to determine three things:
How to combine a single annotator’s label and confidence score into a probability.
How to combine the individual annotator’s probabilities into a single probability for a sample.
Same as 2, but determine a single “hard” label as consensus.
For item 1, let’s make it so a confidence of 5 doesn’t change the label at all, while a confidence of 1 results in a uniform distribution. The other confidence levels will fall between these.
For item 2, we’ll average the probabilites from item 1 across annotators. For item 3, we’ll just take the argmax of 2.
These are implemented below as the add_annotation_prob_labels, add_sample_prob_labels, and add_sample_hard_labels
methods of our custom ConfidenceLabelGenerator.
confidence_label_generator.py
import numpy as np
import pandas as pd
from effiara.label_generators import LabelGenerator
class ConfidenceLabelGenerator(LabelGenerator):
def _to_onehot(self, lab):
# self.label_mapping is an argument to __init__
onehot = np.zeros(len(self.label_mapping))
idx = self.label_mapping[lab]
onehot[idx] = 1.
return onehot
def add_annotation_prob_labels(self, df):
"""
Convert the annotation of a single annotator on a single
example into a probability distribution over classes.
"""
conf_scale = np.arange(1, 6)
# weights are evenly spaced between [0.5 - 0.0]
weights = np.linspace(0.5, 1.0, 5)
conf2weight = dict(zip(conf_scale, weights))
def add_prob_label_to_row(row):
# self.annotators is an argument to __init__
for annotator in self.annotators:
y = self._to_onehot(row[f"{annotator}_label"])
conf = row[f"{annotator}_confidence"]
weight = conf2weight[conf]
row[f"{annotator}_prob"] = np.abs(y - weight)
return row
return df.apply(add_prob_label_to_row, axis=1)
def add_sample_prob_labels(self, df, reliability_dict=None):
"""
Aggregate annotations from multiple annotators into a single
probability distribution.
"""
prob_label_cols = [c for c in df.columns if c.endswith("_prob")]
if len(prob_label_cols) == 0:
df = self.add_annotation_prob_labels(df)
def compute_avg_prob_label(row):
row["consensus_prob"] = np.mean(row[prob_label_cols])
return row
return df.apply(compute_avg_prob_label, axis=1)
def add_sample_hard_labels(self, df):
"""
Aggregate annotations from multiple annotators into a single
'hard' onehot encoding.
"""
if "consensus_prob" not in df.columns:
df = self.add_sample_prob_labels(df)
dfcp = df.copy()
onehots = np.zeros((len(df), len(self.label_mapping)))
idxs = df["consensus_prob"].apply(np.argmax)
onehots[np.arange(len(df)), idxs] = 1
dfcp["consensus_hard"] = list(onehots)
return dfcp
Once all this is done, we can pass an instance of our ConfidenceLabelGenerator
to the Annotations class to use the new labels when computing agreement.
from effiara.annotator_reliability import Annotations
from .confidence_label_generator import ConfidenceLabelGenerator # The class we defined above.
label_mapping = {"no": 0, "yes": 1}
annotators = ["Larry", "Curly", "Moe"]
label_generator = ConfidenceLabelGenerator(annotators, label_mapping)
annos = Annotations(annotations, # we defined this above.
label_generator=label_generator,
agreement_metric="cosine",
agreement_suffix="_prob") # Compute agreement from these columns
print(annos)