ntqr.r2.datasketches

@author: Andrés Corrada-Emmanuel.

Attributes

Classes

TrioLabelVoteCounts

Data class for the by-label aligned votes of three binary classifiers.

TrioVoteCounts

Data class to validate the test counts for three binary classifiers.

ObservedVoteCounts

Data class to validate the test vote counts for an arbitrary number

Functions

classifier_label_votes(→ tuple[Votes, Ellipsis])

All the vote patterns where classifier voted with label.

votes_match(→ bool)

Test if labels by classifiers matches votes.

classifiers_labels_votes(→ tuple[Votes, Ellipsis])

Trio voting patterns that match labels by classifiers.

Module Contents

ntqr.r2.datasketches.Label
ntqr.r2.datasketches.Votes
ntqr.r2.datasketches.VoteCounts
ntqr.r2.datasketches.LabelVoteCounts
ntqr.r2.datasketches.VoteFrequencies
ntqr.r2.datasketches.opposite_label
ntqr.r2.datasketches.classifier_label_votes(classifier: int, label: Label, vote_patterns: Iterable[Votes]) tuple[Votes, Ellipsis]

All the vote patterns where classifier voted with label.

Parameters:
  • classifier (int) – Index of the classifier in the vote_patterns.

  • label (Label) – The classifier label vote.

Returns:

All the vote patterns where classifier voted with label.

Return type:

tuple[Votes, …]

ntqr.r2.datasketches.votes_match(votes: tuple[Label, Ellipsis], classifiers: Iterable[int], labels: Iterable[Label]) bool

Test if labels by classifiers matches votes.

Parameters:
  • votes (Tuple[Label, ...]) – Vote pattern for the trio.

  • classifiers (Iterable[int]) – Classifiers to check.

  • labels (Iterable[Label]) – Label voted by each classifier - the voting sub-pattern we want a trio pattern to match.

Return type:

bool.

ntqr.r2.datasketches.classifiers_labels_votes(classifiers: Iterable[int], labels: Iterable[Label], vote_patterns: Iterable[Votes]) tuple[Votes, Ellipsis]

Trio voting patterns that match labels by classifiers.

Parameters:
  • classifiers (Iterable[int]) – The indices of the classifiers in the trio to match.

  • labels (Iterable[Label]) – Voting sub-pattern by classifiers to match.

Returns:

Tuple of trio voting patterns that matched labels for the classifiers.

Return type:

tuple[Votes, …]

class ntqr.r2.datasketches.TrioLabelVoteCounts(label_vote_counts)

Data class for the by-label aligned votes of three binary classifiers.

This class is only useful in an experimental setting where one has observed a test with labeled data. Initialized with a Mapping[Label, Mapping[Votes, int]] of the form {

‘a’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}, ‘b’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}

}

DEPRECATED: This clas is being replaced with the upcoming LabelVoteCounts for an arbitrary number of classifiers.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
label_vote_counts
to_vote_counts() VoteCounts

Turn by-label counts into by-vote-pattern counts.

Using {‘a’:{…, (‘a’, ‘b’, ‘a’): x, …},

‘b’:{…, (‘a’, ‘b’, ‘a’): y, …}}

Returns:

{…, (‘a’, ‘b’, ‘a’)

Return type:

x+y, …}

to_TrioVoteCounts()

Return TrioVoteCounts object by summing votes across labels.

to_voting_frequency_fractions() VoteFrequencies

Compute observed voting pattern frequencies.

Return type:

Mapping[Votes, Fraction]

to_voting_frequencies_float() Mapping[Votes, float]

Compute observed voting frequencies inexactly, as floats.

Return type:

Mapping[Votes, float]

__getitem__(label)

Return the vote pattern counts for label.

Parameters:

label (Label) – One of ‘a’ or ‘b’.

Returns:

The aligned vote counts observed for the given label.

Return type:

Mapping[Votes, int]

flip_classifiers_label_decisions(classifiers: Iterable, label: Label)
class ntqr.r2.datasketches.TrioVoteCounts

Data class to validate the test counts for three binary classifiers.

Initialized with a Mapping[Votes, int] of the form:

{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}

This is the class that is used for evaluation on unlabeled data where we only have access to the aligned decisions of the binary classifiers and have no knowledge of label of any one item that was classified.

DEPRECATED: This class will be replaced with the ObservedVoteCounts class that can handle an arbitrary number of classifiers.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
vote_counts: VoteCounts
__post_init__()

Check we have counts for a valid evaluation of binary classifiers.

  1. No negative counts.

  2. Initialize all possible vote patterns by the trio.

  3. The empty test - all counts zero - is not allowed.

to_frequencies_exact() VoteFrequencies

Turn vote integer counts to exact Fraction objects.

Returns:

Maps a trio vote pattern to its Fraction occurence in the test.

Return type:

VoteFrequencies

to_frequencies_float() Mapping[Votes, float]

Compute observerd voting pattern frequencies, inexactly, as floats.

Returns:

Maps a trio vote pattern to its percentage occurence in the test as an inexact float.

Return type:

Mapping[Votes, float]

classifier_label_frequency(classifier: int, label: Label) sympy.Rational

Calculate classifier label voting frequency.

Parameters:
  • classifier (int) – The index of the classifier.

  • label (Label) – The label.

Returns:

The fraction of times the classifier voted the label when classifying items in the test.

Return type:

sympy.Rational(label_vote_counts, test_size)

classifier_label_responses(classifier: int, label: Label) int

Calculates number of responses with label by classifier.

Parameters:
  • classifier (int) – DESCRIPTION.

  • label (Label) – DESCRIPTION.

Returns:

Number of times the classifier decided an item was label.

Return type:

int

pair_label_frequency(pair: Iterable[int], label: Label) sympy.Rational

Compute frequency of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

The fraction of times a pair of classifiers voted with the same label when classifying items in the test.

Return type:

sympy.Rational

pair_label_responses(pair: Iterable[int], label: Label) sympy.Rational

Computes number of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

Number of items pair voted with the same label.

Return type:

int

pair_frequency_moment(pair: Iterable[int], label: Label) sympy.Rational

Calculate the label classifier pair frequency moment.

If (i, j) = pair, then this is -

f_{label_i, label_j} - f_{label_i} * f_{label_j}

The fraction of times the classifier pair voted with the same label minus the product of their individual label voting frequencies.

Parameters:
  • pair (Iterable(int, int)) – The pair of classifiers.

  • label (Label) – One of the binary labels.

Returns:

The pair frequency moment as a Fraction object

Return type:

sympy.Rational

label_pairs_frequency_moments(label: Label) Mapping[tuple[int, int], sympy.Rational]

All the label pair frequency moments.

trio_frequency_moment() sympy.Rational

Calculate the 3rd frequency moment of a trio of binary classifiers.

Don’t ask.

class ntqr.r2.datasketches.ObservedVoteCounts

Data class to validate the test vote counts for an arbitrary number of binary classifiers.

Initialized with a Mapping[Votes, int] of the form:

{(‘a’, …, a’): int, …, (‘b’, …, ‘b’):int}

Class used during evaluations with unlabeled data where we do not know the true label for each item labeled by the classifiers.

vote_counts: VoteCounts
__post_init__()

Check we have counts for a valid evaluation of binary classifiers.

  1. Determine the size of the ensemble and that all provided vote patterns conform to it.

  2. No negative counts.

  3. Initialize all possible vote patterns by the trio.

  4. The empty test - all counts zero - is not allowed.

to_frequencies_exact() VoteFrequencies

Turn vote integer counts to exact Fraction objects.

Returns:

Maps a trio vote pattern to its Fraction occurence in the test.

Return type:

VoteFrequencies

to_frequencies_float() Mapping[Votes, float]

Compute observerd voting pattern frequencies, inexactly, as floats.

Returns:

Maps a trio vote pattern to its percentage occurence in the test as an inexact float.

Return type:

Mapping[Votes, float]

classifier_label_frequency(classifier: int, label: Label) sympy.Rational

Calculate classifier label voting frequency.

Parameters:
  • classifier (int) – The index of the classifier.

  • label (Label) – The label.

Returns:

The fraction of times the classifier voted the label when classifying items in the test.

Return type:

sympy.Rational(label_vote_counts, test_size)

classifier_label_responses(classifier: int, label: Label) int

Calculates number of responses with label by classifier.

Parameters:
  • classifier (int) – DESCRIPTION.

  • label (Label) – DESCRIPTION.

Returns:

Number of times the classifier decided an item was label.

Return type:

int

pair_label_frequency(pair: Iterable[int], label: Label) sympy.Rational

Compute frequency of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

The fraction of times a pair of classifiers voted with the same label when classifying items in the test.

Return type:

sympy.Rational

pair_label_responses(pair: Iterable[int], label: Label) sympy.Rational

Computes number of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

Number of items pair voted with the same label.

Return type:

int

pair_frequency_moment(pair: Iterable[int], label: Label) sympy.Rational

Calculate the label classifier pair frequency moment.

If (i, j) = pair, then this is -

f_{label_i, label_j} - f_{label_i} * f_{label_j}

The fraction of times the classifier pair voted with the same label minus the product of their individual label voting frequencies.

Parameters:
  • pair (Iterable(int, int)) – The pair of classifiers.

  • label (Label) – One of the binary labels.

Returns:

The pair frequency moment as a Fraction object

Return type:

sympy.Rational

label_pairs_frequency_moments(label: Label) Mapping[tuple[int, int], sympy.Rational]

All the label pair frequency moments.

trio_frequency_moment() sympy.Rational

Calculate the 3rd frequency moment of a trio of binary classifiers.

Don’t ask.