API

class effiara.annotator_reliability.Annotations(df: DataFrame, label_generator: LabelGenerator | None = None, agreement_metric: str = 'krippendorff', agreement_suffix: str = '_label', agreement_type: str = 'nominal', overlap_threshold: int = 15, merge_labels: dict | None = None, reliability_alpha: float = 0.5, reannotations: bool = False, strength: float = 1)[source]

Class to hold all annotation information for the EffiARA annotation framework. Methods include inter- and intra- annotator agreement calculations, as well the overall reliability calculation and other utilities.

label_generator

label generator to create individual annotation labels and soft/hard aggregations.

Type:: effiara.LabelGenerator

annotators

list of annotator names.

Type:: list

num_annotators

number of annotators

Type:: int

label_mapping

label mapping of what is in the dataframe to what should be used for agreement/training.

Type:: dict

num_classes

number of classes.

Type:: int

agreement_metric

agreement metric to be used.

Type:: str

agreement_suffix

label suffix to get the agreement from (such as “_label” as the default).

Type:: str

agreement_type

type of agreement (e.g. nominal, ordinal).

Type:: str

merge_labels

dict of labels to merge.

Type:: dict

reannotation

whether the dataframe contains re-annotations under the re_* columns.

Type:: bool

strength

strength of reliability calculations (higher strength will lead to more polarised reliability values).

Type:: float

calculate_annotator_reliability(alpha=0.5, epsilon=0.001)[source]

Recursively calculate annotator reliability, using: intra-annotator agreement, inter-annotator agreement, or a mixture, controlled by the alpha and beta parameters. Alpha and Beta must sum to 1.0.

Parameters:

alpha (float) – Default 0.5. Value between 0 and 1 controlling weight of intra-annotator agreement. # noqa
beta (float) – Default 0.5. Value between 0 and 1, controlling weight of inter-annotator agreement. # noqa
epsilon (float) – Default 0.001. Controls the maximum change from the last iteration to indicate convergence. # noqa

calculate_avg_inter_annotator_agreement()[source]: Calculate each annotator’s average agreement using using a weighted average from the annotators around them. The average is weighted by the overall reliability score of each annotator.

calculate_inter_annotator_agreement()[source]: Calculate the inter-annotator agreement between each pair of annotators. Each agreement value will be represented on the edges of the graph between nodes that are representative of each annotator.

calculate_intra_annotator_agreement()[source]: Calculate intra-annotator agreement.

calculate_overall_inter_annnotator_agreement()[source]: Calculate the overall inter-annotator agreement metric for the whole dataset. Currently only Krippendorff’s alpha is implemented.

display_agreement_heatmap(annotators: list | None = None, other_annotators: list | None = None, display_upper=False)[source]

Plot a heatmap of agreement metric values for the annotators.

If both annotators and other_annotators are specifed, compares users in annotators to those in other_annotators. Otherwise, compare all project annotators to each other.

Parameters:

annotators (list) – Optional.
other_annotators (list) – Optional.

Returns:

A matrix of the data displayed on the graph. List[str]: List of annotators in the order of the matrix rows.

Return type:

np.ndarray

display_annotator_graph(legend=False)[source]: Display the annotation graph.

generate_final_labels_and_sample_weights()[source]: Generate the final labels and sample weights for the dataframe.

get_agreement(user_1, user_2) → float | None[source]

Get the agreement between two annotators.

Parameters:

user_1 (str) – the name of the first annotator.
user_2 (str) – the name of the second annotator.

Returns:

agreement between the two annotators (or None).

Return type:

Optional[float]

get_reliability_dict()[source]

Get a dictionary of reliability scores per username.

Returns:: dictionary of key=username, value=reliability.
Return type:: dict

get_user_reliability(username)[source]

Get the reliability of a given annotator.

Parameters:: username (str) – username of the annotator.
Returns:: reliability score of the annotator.
Return type:: float

init_annotator_graph()[source]: Initialise the annotator graph with an initial reliability of 1. This means each annotator will initially be weighted equally.

normalise_edge_property(property)[source]

Normalise an edge property to have a mean of 1.

Parameters:: property (str) – the name of the edge property to normalise.

normalise_node_property(property)[source]

Normalise a node property to have a mean of 1.

Parameters:: property (str) – the name of the node property to normalise.

replace_labels()[source]: Merge labels. Uses find and replace so do not switch labels e.g. {“misinfo”: [“debunk”], “debunk”: [“misinfo”, “other”]}.

annotators

Type:: list

num_annotators

Type:: int

time_available

Type:: float

annotation_rate

Type:: float

num_samples

Type:: int

double_proportion

Type:: float

re_proportion

Type:: float

create_example_distribution_df()[source]: Create a simple DataFrame to test sample distribution.

distribute_samples(df: DataFrame, save_path: str | None = None, all_reannotation: bool = False) → Dict[str, DataFrame][source]

Distribute samples based on sample distributor: settings.

Parameters:

df (pd.DataFrame) – dataframe containing samples with each row being a separate sample - using a copy is recommended.
save_path (str) – (Optional) If not None, dir path to save all data to. If not supplied, a dict of allocations is returned. Default None.
all_reannotation (bool) – whether re-annotations should be sampled from all the user’s annotations rather than just single annotations. In this case, a double annotation project amount is sampled from all their annotations.

Returns:

Mapping from usernames to assigned samples.

Return type:

dict

Solves the annotation framework equation to find the missing variable. Only one of the available arguments should be ommitted.

Parameters:

num_annotators (int) – number of annotators available [n].
time_available (float) – time available for each annotator (assuming they all have the same time available) [t].
annotation_rate (float) – expected rate of annotation per unit time (same unit as time_available) [rho].
num_samples (int) – number of desired samples [k].
double_proportion (float) – proportion of the whole dataset that should be double-annotated samples (0 <= n <= 1) [d].
re_proportion (float) – proportion of single-annotated samples that should be re-annotated (0 <= n <= 1) [r].

output_variables()[source]: Output all variables.

set_project_distribution()[source]: Set project distributions once all values have been defined.

class effiara.label_generators.LabelGenerator(annotators: list, label_mapping: dict, label_suffixes: List[str] | None = None)[source]

Abstract class for generation of labels for set of annotations.

This class should be subclassed for each individual annotation project. The subclass should override the following methods: add_annotation_prob_labels, add_sample_prob_labels, add_sample_hard_labels

That is, create a new file with the following:

from effiara import LabelGenerator

class MyLabelGenerator(LabelGenerator):

    def add_annotation_prob_labels(self, df):
        ...

    def add_sample_prob_labels(self, df, reliability_dict):
        ...

    def add_sample_hard_labels(self, df):
        ...

abstractmethod add_annotation_prob_labels(df: DataFrame) → DataFrame[source]

Add probability distribution (soft) labels: to each individual annotation.

Parameters:: df (pd.DataFrame) – dataframe with all annotation data to add probability label column to.
Returns:: dataframe with added labels.
Return type:: (pd.DataFrame)

abstractmethod add_sample_hard_labels(df) → DataFrame[source]

Implemented to give each sample a one-hot: hard label for use in the classification task.

Parameters:: df (pd.DataFrame) – dataframe with all annotation data to add probability label column to.
Returns:: dataframe with added labels.
Return type:: (pd.DataFrame)

abstractmethod add_sample_prob_labels(df: DataFrame, reliability_dict: dict) → DataFrame[source]

Add probability distribution (soft) labels: to each individual sample, likely using some combination of annotation probability labels. Can optionally add a sample_weight column to weight samples in training based on annotator reliability.

Parameters:

df (pd.DataFrame) – dataframe with all annotation data to add probability label column to.
reliability_dict (dict) – dict of each annotator and their reliability score.

Returns:

dataframe with added labels.

Return type:

(pd.DataFrame)

classmethod from_annotations(df: DataFrame, num_classes=None)[source]

Initialize from an annotations dataframe. Relies on labels being stored in the _label columns.

Parameters:

df (pd.DataFrame) – annotations, must contain _label columns.
num_classes (int) – if not None, infer from df.

Functions for computing agreement metrics.

effiara.agreement.calculate_krippendorff_alpha_per_label(pair_df, annotator_1_col, annotator_2_col, agreement_type='nominal')[source]

Calculate Krippendorff’s alpha for each label and return the average.

Requires the data in the given columns to be a binarised array of each label (i.e. whether the label is present in the given sample).

Parameters:

annotator_1_col (str) – column containing the binarised annotations for the first annotator.
annotator_2_col (str) – column containing the binarised annotations for the second annotator.
agreement_type (str) – type of agreement: - nominal - ordinal - interval - ratio

Returns:

average Krippendorff’s alpha across labels.

Return type:

float

effiara.agreement.cosine_similarity(vector_a, vector_b)[source]

Calculate the cosine similarity between two vectors.

Parameters:

vector_a (np.ndarray)
vector_b (np.ndarray)

Returns:

cosine similarity between the two vectors.

Return type:

float

Raises:

ZeroDivisionError – when vector_a or vector_b is the zero vector.

effiara.agreement.inter_annotator_agreement_krippendorff(df, label_cols, label_mapping)[source]

Calculate overall Krippendorff’s alpha inter-annotator agreement metric.

Parameters:

df (pd.DataFrame) – dataframe containing all labels.
label_cols (List[str]) – annotators’ label columns to calculate agreement among.
label_mapping (dict) – mapping between labels in datasets to numeric label.

Returns:

Krippendorff’s alpha agreement metric.

Return type:

float

effiara.agreement.pairwise_agreement(df, user_x, user_y, label_mapping, num_classes, metric='krippendorff', agreement_type='nominal', label_suffix='_label')[source]

Get the pairwise annotator agreement given the full dataframe.

Parameters:

df (pd.DataFrame) – full dataframe containing the whole dataset.
user_x (str) – name of the user in the form user_x.
user_y (str) – name of the user in the form user_y.
metric (str) –
agreement metric to use for inter-/intra-annotator agreement.
- krippendorff: nominal krippendorff’s alpha similarity metric on hard labels only.
- cohen: nominal cohen’s kappa similarity metric on hard labels only.
- fleiss: nominal fleiss kappa similarity metric on hard labels only.
- multi_krippendorff: krippendorff similarity by label for multilabel classification.
- cosine: the cosine similarity metric to be used on soft labels.
- percentage: simple percentage agreement between the two annotators.
agreement_type (str) –
type of agreement. * nominal
- ordinal
- interval
- ratio
NOTE: currently only working for multi_krippendorff.
label_suffix (str) – suffix for the label being compared.

Returns:

agreement between user_x and user_y.

Return type:

float

effiara.agreement.pairwise_cohens_kappa_agreement(pair_df, heading_1, heading_2, label_mapping)[source]

Cohen’s kappa agreement metric between two annotators, given two

headings for each annotator column containing their primary label for each sample.

Does not require any specific formatting of labels within the columns heading_1 and heading_2.

Parameters:

pair_df (pd.DataFrame) – dataframe filtered to contain only the samples that allow agreement calculations.
heading_1 (str) – heading of the first column required to calculate agreement.
heading_2 (str) – heading of the second column required to calculate agreement.
label_mapping (dict) – mapping of labels to numeric values.

Returns:

Cohen’s Kappa.

Return type:

float

effiara.agreement.pairwise_cosine_similarity(pair_df, heading_1, heading_2, num_classes=3)[source]

Calculate the cosine similarity between two columns of soft labels.

Requires the two headings to be formatted as a soft label (list or np.array filled with floats summing to 1).

Parameters:

pair_df (pd.DataFrame) – data frame containing annotation data.
heading_1 (str) – heading of first column containing soft labels.
heading_2 (str) – heading of second column containing soft labels.

Returns:

average cosine similarity between the two sets of soft labels.

Return type:

float

effiara.agreement.pairwise_fleiss_kappa_agreement(pair_df, heading_1, heading_2, label_mapping)[source]

Fleiss kappa agreement metric between two annotators, given two

headings for each annotator column containing their primary label for each sample.

Does not require any specific formatting of labels within the columns heading_1 and heading_2.

Parameters:

pair_df (pd.DataFrame) – dataframe filtered to contain only the samples that allow agreement calculations.
heading_1 (str) – heading of the first column required to calculate agreement.
heading_2 (str) – heading of the second column required to calculate agreement.
label_mapping (dict) – mapping of labels to numeric values.

Returns:

Fleiss’ Kappa.

Return type:

float

effiara.agreement.pairwise_nominal_krippendorff_agreement(pair_df, heading_1, heading_2, label_mapping)[source]

Get the nominal krippendorff agreement between two annotators,

given two headings for each annotator column containing their primary label for each sample.

Does not require any specific formatting of labels within the columns heading_1 and heading_2.

Parameters:

pair_df (pd.DataFrame) – dataframe filtered to contain only the samples that allow agreement calculations.
heading_1 (str) – heading of the first column required to calculate agreement.
heading_2 (str) – heading of the second column required to calculate agreement.
label_mapping (dict) – mapping of labels to numeric values.

Returns:

Krippendorff’s Alpha.

Return type:

float

effiara.agreement.pairwise_percentage_agreement(pair_df, heading_1, heading_2)[source]

Pairwise percentage agreement between two annotators, given two

headings for each annotator column containing their primary label for each sample.

Does not require any specific formatting of labels within the columns heading_1 and heading_2.

Parameters:

pair_df (pd.DataFrame) – dataframe filtered to contain only the samples that allow agreement calculations.
heading_1 (str) – heading of the first column required to calculate agreement.
heading_2 (str) – heading of the second column required to calculate agreement.

Returns:

Percentage agreement.

Return type:

float