ntqr.r2.evaluators

Evaluators for binary classification tests (R=2).

Classes:

SupervisedEvaluation ErrorIndependentEvaluation MajorityVotingEvaluation

Functions:

Misc variables:

Attributes

Classes

SupervisedEvaluation

Evaluation for experiments where the true labels are known.

ErrorIndependentEvaluation

Evaluate three binary classifiers assuming they are error independent.

MajorityVotingEvaluation

Evaluate three binary classifiers using majority voting.

Functions

turn_numerical(val)

Turns exact values into numerical ones.

Module Contents

ntqr.r2.evaluators.turn_numerical(val)

Turns exact values into numerical ones.

Parameters:

val (TYPE) – DESCRIPTION.

Return type:

None.

class ntqr.r2.evaluators.SupervisedEvaluation(label_counts: ntqr.r2.datasketches.TrioLabelVoteCounts)

Evaluation for experiments where the true labels are known.

vote_patterns
pairs = ((0, 1), (0, 2), (1, 2))
label_counts
evaluation_exact
evaluation_float
prevalences()

Calculate the prevalences of the two labels.

Returns:

Mapping from labels to percentage of appearance in the test.

Return type:

Mapping[Label, Fraction]

classifier_label_accuracy(classifier: int, label: ntqr.r2.datasketches.Label)

Compute classifier label accuracy.

other_label(label: ntqr.r2.datasketches.Label)

Return the other binary classification label given label.

pair_label_error_correlation(pair, label)

Calculate the label error correlation a classifier pair.

three_way_label_error_correlation(triplet, label)

Calculate the label error correlation a classifier pair.

class ntqr.r2.evaluators.ErrorIndependentEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)

Evaluate three binary classifiers assuming they are error independent.

Returns:

  • Absent labeled data, there are two logically consistent solutions

  • given only their decision voting frequencies. For binary classification,

  • this means that there are 2 possible points in evaluation space that

  • can possibly explain the test results. The ground truth evaluation is

  • one of these points – if the assumption of error independence is true.

  • The exact algebraic results have a unique virtue that few alarm systems

  • have - it can warn about the failures of its own assumption of error

  • independence. If the two possible solutions for the ‘a’ label prevalence

  • return an unresolved integer square root - the classifiers are error

  • correlated in the evaluation.

  • In version 0.2 the math needed to take handle the almost certain detection

  • of error correlation will be added. It is already being built as

  • can be seen in ntqr.r2.postulates file where the postulates related to

  • computing the error correlation have been expressed using the SymPy

  • package.

  • Warnings

  • ———

  • A. The ntqr package uses a notion of ‘error independence’ that is

  • different than the one most familiar in the ML/AI community. There are

  • many notions of independence in mathematics. In the context of ML/AI

  • papers/discussions, the term ‘error independence’ is taken to be

  • A.1. Functional independence of distributions (P(x, y) = P(x)P(y))

  • The one used in the ntqr package is sample defined since there is no

  • probability theory used in its logic. For that reason, you must define a

  • set of error correlation parameters. ‘Error independence’ in the ntqr

  • package means

  • A.2. pair_label_correlations = 0, trio_label_correlations = 0, …

  • It is best to think of ‘error independence’ in the ntqr package as a

  • property that belongs to the classifiers AND the test they took.

  • B. This class currently assumes that the observed classifier

  • vote counts supplied by the user are not fake. The set of all valid

  • observations from a classification test is much smaller than the set

  • of all sets of eight positive integers. Future versions of the ntqr

  • package will implement the algebraic geometry needed to detect when

  • TrioVoteCounts objects are not explainable as observations from a

  • classification test.

  • The error independent solution can fail if, in fact, the classifiers

  • are highly correlated on the test being evaluated. Tests can fail.

  • Future versions will have implemented the exceptions.

    1. PrevalenceImaginaryException

    2. NoSolutionException

  • The PrevalenceImaginaryException is a iron-clad detection of highly

  • correlated classifiers. Its main utility will be in “warning light”

  • applications in AI safety.

  • The NoSolutionException means that NO independent system can possibly

  • explain the observations. There are two different reasons for this -

  • higher error correlations, or the data sketch is fake. Distinguishing

  • between the two comes down to the same computation of error correlation.

vote_counts
vote_frequencies
evaluation_exact
evaluation_float
alpha_prevalence_quadratic_terms()

Calculate the coefficients of the ‘a’ label prevalence quadratic.

If the quadratic is represented as:

a * (P_a)**2 + b * P_a + c

then,

a = terms[2], b = terms[1], c = terms[0].

The quadratic is written in the ‘standard’ way seen in algebra textbooks. Be careful to not mistake the ‘a’ or ‘b’ coefficients described here with the two labels being used for classification - currently implemented as (‘a’, ‘b’).

alpha_prevalence_estimates()

Calculate the prevalence of the alpha label.

Since the quadratic equation has ordered solutions by the plus/minus operations, we arbitrarily return the ‘a’ label less than 50% solution first.

classifier_a_label_accuracy(classifier: int, a_prevalence)

Calculate classifier ‘a’ label accuracies.

Parameters:
  • classifier (int) – One of (0, 1, 2).

  • a_prevalence (Sympy expression)

Return type:

The a label accuracy given the a_prevalence value

classifier_b_label_accuracy(classifier: int, a_prevalence)

Calculate classifier ‘b’ label accuracies.

Parameters:

classifier (int) – One of (0, 1, 2).

Returns:

  • Two possible logically consistent estimates for P_{i,b} given the

  • test error independence assumption.

class ntqr.r2.evaluators.MajorityVotingEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)

Evaluate three binary classifiers using majority voting.

Majority voting can be used to carry out evaluation algebraically. Typically, majority voting is used with the assumption that the crowd is always right. In the context of safety, however, that the crowd is always wrong is an equally valid a-priori assumption. Hence, this class returns TWO evaluations. The first assuming the crowd is always right and the second assuming they are always wrong. Its main virtue is that it is simple and rock solid - always returns logically consistent evaluations.

vote_patterns
vote_counts
vote_frequencies
labels = ('a', 'b')
majority_right_vote_patterns
majority_wrong_vote_patterns
evaluation_exact
evaluation_float
compute_vote_pattern_evaluation(vote_patterns, flip)
prevalences(vote_patterns)

Compute label prevalences in the test.

classifier_label_accuracy(classifier, vote_patterns, label, flip)

Compute the label accuracy for classifier.

to_float(sol)
ntqr.r2.evaluators.data_sketch