ntqr

Evaluate noisy decision makers using logic and algebra.

Classes:

Functions:

Misc variables:

__version__ uci_adult_test_example

Submodules

Attributes

Classes

TrioLabelVoteCounts

Data class for the by-label aligned votes of three binary classifiers.

TrioVoteCounts

Data class to validate the test counts for three binary classifiers.

ErrorIndependentEvaluation

Evaluate three binary classifiers assuming they are error independent.

MajorityVotingEvaluation

Evaluate three binary classifiers using majority voting.

SupervisedEvaluation

Evaluation for experiments where the true labels are known.

Label

Label object to guarantee a label is stringifiable.

Labels

Labels used in test question responses. The NTQR package assumes that

Functions

Package Contents

ntqr.__version__ = '0.8'
ntqr.uciadult_label_counts: ntqr.r2.datasketches.LabelVoteCounts
class ntqr.TrioLabelVoteCounts(label_vote_counts)

Data class for the by-label aligned votes of three binary classifiers.

This class is only useful in an experimental setting where one has observed a test with labeled data. Initialized with a Mapping[Label, Mapping[Votes, int]] of the form {

‘a’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}, ‘b’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}

}

DEPRECATED: This clas is being replaced with the upcoming LabelVoteCounts for an arbitrary number of classifiers.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
label_vote_counts
to_vote_counts() VoteCounts

Turn by-label counts into by-vote-pattern counts.

Using {‘a’:{…, (‘a’, ‘b’, ‘a’): x, …},

‘b’:{…, (‘a’, ‘b’, ‘a’): y, …}}

Returns:

{…, (‘a’, ‘b’, ‘a’)

Return type:

x+y, …}

to_TrioVoteCounts()

Return TrioVoteCounts object by summing votes across labels.

to_voting_frequency_fractions() VoteFrequencies

Compute observed voting pattern frequencies.

Return type:

Mapping[Votes, Fraction]

to_voting_frequencies_float() Mapping[Votes, float]

Compute observed voting frequencies inexactly, as floats.

Return type:

Mapping[Votes, float]

__getitem__(label)

Return the vote pattern counts for label.

Parameters:

label (Label) – One of ‘a’ or ‘b’.

Returns:

The aligned vote counts observed for the given label.

Return type:

Mapping[Votes, int]

flip_classifiers_label_decisions(classifiers: Iterable, label: Label)
class ntqr.TrioVoteCounts

Data class to validate the test counts for three binary classifiers.

Initialized with a Mapping[Votes, int] of the form:

{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}

This is the class that is used for evaluation on unlabeled data where we only have access to the aligned decisions of the binary classifiers and have no knowledge of label of any one item that was classified.

DEPRECATED: This class will be replaced with the ObservedVoteCounts class that can handle an arbitrary number of classifiers.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
vote_counts: VoteCounts
__post_init__()

Check we have counts for a valid evaluation of binary classifiers.

  1. No negative counts.

  2. Initialize all possible vote patterns by the trio.

  3. The empty test - all counts zero - is not allowed.

to_frequencies_exact() VoteFrequencies

Turn vote integer counts to exact Fraction objects.

Returns:

Maps a trio vote pattern to its Fraction occurence in the test.

Return type:

VoteFrequencies

to_frequencies_float() Mapping[Votes, float]

Compute observerd voting pattern frequencies, inexactly, as floats.

Returns:

Maps a trio vote pattern to its percentage occurence in the test as an inexact float.

Return type:

Mapping[Votes, float]

classifier_label_frequency(classifier: int, label: Label) sympy.Rational

Calculate classifier label voting frequency.

Parameters:
  • classifier (int) – The index of the classifier.

  • label (Label) – The label.

Returns:

The fraction of times the classifier voted the label when classifying items in the test.

Return type:

sympy.Rational(label_vote_counts, test_size)

classifier_label_responses(classifier: int, label: Label) int

Calculates number of responses with label by classifier.

Parameters:
  • classifier (int) – DESCRIPTION.

  • label (Label) – DESCRIPTION.

Returns:

Number of times the classifier decided an item was label.

Return type:

int

pair_label_frequency(pair: Iterable[int], label: Label) sympy.Rational

Compute frequency of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

The fraction of times a pair of classifiers voted with the same label when classifying items in the test.

Return type:

sympy.Rational

pair_label_responses(pair: Iterable[int], label: Label) sympy.Rational

Computes number of times a pair voted with the same label.

Parameters:
  • pair (Iterable[int, int]) – Classifier indicies.

  • label (Label) – The label.

Returns:

Number of items pair voted with the same label.

Return type:

int

pair_frequency_moment(pair: Iterable[int], label: Label) sympy.Rational

Calculate the label classifier pair frequency moment.

If (i, j) = pair, then this is -

f_{label_i, label_j} - f_{label_i} * f_{label_j}

The fraction of times the classifier pair voted with the same label minus the product of their individual label voting frequencies.

Parameters:
  • pair (Iterable(int, int)) – The pair of classifiers.

  • label (Label) – One of the binary labels.

Returns:

The pair frequency moment as a Fraction object

Return type:

sympy.Rational

label_pairs_frequency_moments(label: Label) Mapping[tuple[int, int], sympy.Rational]

All the label pair frequency moments.

trio_frequency_moment() sympy.Rational

Calculate the 3rd frequency moment of a trio of binary classifiers.

Don’t ask.

class ntqr.ErrorIndependentEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)

Evaluate three binary classifiers assuming they are error independent.

Returns:

  • Absent labeled data, there are two logically consistent solutions

  • given only their decision voting frequencies. For binary classification,

  • this means that there are 2 possible points in evaluation space that

  • can possibly explain the test results. The ground truth evaluation is

  • one of these points – if the assumption of error independence is true.

  • The exact algebraic results have a unique virtue that few alarm systems

  • have - it can warn about the failures of its own assumption of error

  • independence. If the two possible solutions for the ‘a’ label prevalence

  • return an unresolved integer square root - the classifiers are error

  • correlated in the evaluation.

  • In version 0.2 the math needed to take handle the almost certain detection

  • of error correlation will be added. It is already being built as

  • can be seen in ntqr.r2.postulates file where the postulates related to

  • computing the error correlation have been expressed using the SymPy

  • package.

  • Warnings

  • ———

  • A. The ntqr package uses a notion of ‘error independence’ that is

  • different than the one most familiar in the ML/AI community. There are

  • many notions of independence in mathematics. In the context of ML/AI

  • papers/discussions, the term ‘error independence’ is taken to be

  • A.1. Functional independence of distributions (P(x, y) = P(x)P(y))

  • The one used in the ntqr package is sample defined since there is no

  • probability theory used in its logic. For that reason, you must define a

  • set of error correlation parameters. ‘Error independence’ in the ntqr

  • package means

  • A.2. pair_label_correlations = 0, trio_label_correlations = 0, …

  • It is best to think of ‘error independence’ in the ntqr package as a

  • property that belongs to the classifiers AND the test they took.

  • B. This class currently assumes that the observed classifier

  • vote counts supplied by the user are not fake. The set of all valid

  • observations from a classification test is much smaller than the set

  • of all sets of eight positive integers. Future versions of the ntqr

  • package will implement the algebraic geometry needed to detect when

  • TrioVoteCounts objects are not explainable as observations from a

  • classification test.

  • The error independent solution can fail if, in fact, the classifiers

  • are highly correlated on the test being evaluated. Tests can fail.

  • Future versions will have implemented the exceptions.

    1. PrevalenceImaginaryException

    2. NoSolutionException

  • The PrevalenceImaginaryException is a iron-clad detection of highly

  • correlated classifiers. Its main utility will be in “warning light”

  • applications in AI safety.

  • The NoSolutionException means that NO independent system can possibly

  • explain the observations. There are two different reasons for this -

  • higher error correlations, or the data sketch is fake. Distinguishing

  • between the two comes down to the same computation of error correlation.

vote_counts
vote_frequencies
evaluation_exact
evaluation_float
alpha_prevalence_quadratic_terms()

Calculate the coefficients of the ‘a’ label prevalence quadratic.

If the quadratic is represented as:

a * (P_a)**2 + b * P_a + c

then,

a = terms[2], b = terms[1], c = terms[0].

The quadratic is written in the ‘standard’ way seen in algebra textbooks. Be careful to not mistake the ‘a’ or ‘b’ coefficients described here with the two labels being used for classification - currently implemented as (‘a’, ‘b’).

alpha_prevalence_estimates()

Calculate the prevalence of the alpha label.

Since the quadratic equation has ordered solutions by the plus/minus operations, we arbitrarily return the ‘a’ label less than 50% solution first.

classifier_a_label_accuracy(classifier: int, a_prevalence)

Calculate classifier ‘a’ label accuracies.

Parameters:
  • classifier (int) – One of (0, 1, 2).

  • a_prevalence (Sympy expression)

Return type:

The a label accuracy given the a_prevalence value

classifier_b_label_accuracy(classifier: int, a_prevalence)

Calculate classifier ‘b’ label accuracies.

Parameters:

classifier (int) – One of (0, 1, 2).

Returns:

  • Two possible logically consistent estimates for P_{i,b} given the

  • test error independence assumption.

class ntqr.MajorityVotingEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)

Evaluate three binary classifiers using majority voting.

Majority voting can be used to carry out evaluation algebraically. Typically, majority voting is used with the assumption that the crowd is always right. In the context of safety, however, that the crowd is always wrong is an equally valid a-priori assumption. Hence, this class returns TWO evaluations. The first assuming the crowd is always right and the second assuming they are always wrong. Its main virtue is that it is simple and rock solid - always returns logically consistent evaluations.

vote_patterns
vote_counts
vote_frequencies
labels = ('a', 'b')
majority_right_vote_patterns
majority_wrong_vote_patterns
evaluation_exact
evaluation_float
compute_vote_pattern_evaluation(vote_patterns, flip)
prevalences(vote_patterns)

Compute label prevalences in the test.

classifier_label_accuracy(classifier, vote_patterns, label, flip)

Compute the label accuracy for classifier.

to_float(sol)
class ntqr.SupervisedEvaluation(label_counts: ntqr.r2.datasketches.TrioLabelVoteCounts)

Evaluation for experiments where the true labels are known.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
label_counts
evaluation_exact
evaluation_float
prevalences()

Calculate the prevalences of the two labels.

Returns:

Mapping from labels to percentage of appearance in the test.

Return type:

Mapping[Label, Fraction]

classifier_label_accuracy(classifier: int, label: ntqr.r2.datasketches.Label)

Compute classifier label accuracy.

other_label(label: ntqr.r2.datasketches.Label)

Return the other binary classification label given label.

pair_label_error_correlation(pair, label)

Calculate the label error correlation a classifier pair.

three_way_label_error_correlation(triplet, label)

Calculate the label error correlation a classifier pair.

class ntqr.Label(label)

Bases: str

Label object to guarantee a label is stringifiable.

_label
__str__()

Return str(self).

class ntqr.Labels(labels: Iterable[str])

Bases: tuple

Labels used in test question responses. The NTQR package assumes that all test questions have the same, fixed, label set. These are a tuple-like object so the user can specify the canonical order of the labels.

The number of labels defines the integer R in the acronym NTQR. Binary classification: R=2 Three label classification: R=3, etc.

_labels
ntqr.copy_notebooks()