ntqr.r2.evaluators¶
Evaluators for binary classification tests (R=2).
- Classes:
SupervisedEvaluation ErrorIndependentEvaluation MajorityVotingEvaluation
Functions:
Misc variables:
Attributes¶
Classes¶
Evaluation for experiments where the true labels are known. |
|
Evaluate three binary classifiers assuming they are error independent. |
|
Evaluate three binary classifiers using majority voting. |
Functions¶
|
Turns exact values into numerical ones. |
Module Contents¶
- ntqr.r2.evaluators.turn_numerical(val)¶
Turns exact values into numerical ones.
- Parameters:
val (TYPE) – DESCRIPTION.
- Return type:
None.
- class ntqr.r2.evaluators.SupervisedEvaluation(label_counts: ntqr.r2.datasketches.TrioLabelVoteCounts)¶
Evaluation for experiments where the true labels are known.
- vote_patterns¶
- pairs = ((0, 1), (0, 2), (1, 2))¶
- label_counts¶
- evaluation_exact¶
- evaluation_float¶
- prevalences()¶
Calculate the prevalences of the two labels.
- Returns:
Mapping from labels to percentage of appearance in the test.
- Return type:
Mapping[Label, Fraction]
- classifier_label_accuracy(classifier: int, label: ntqr.r2.datasketches.Label)¶
Compute classifier label accuracy.
- other_label(label: ntqr.r2.datasketches.Label)¶
Return the other binary classification label given label.
- pair_label_error_correlation(pair, label)¶
Calculate the label error correlation a classifier pair.
- three_way_label_error_correlation(triplet, label)¶
Calculate the label error correlation a classifier pair.
- class ntqr.r2.evaluators.ErrorIndependentEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)¶
Evaluate three binary classifiers assuming they are error independent.
- Returns:
Absent labeled data, there are two logically consistent solutions
given only their decision voting frequencies. For binary classification,
this means that there are 2 possible points in evaluation space that
can possibly explain the test results. The ground truth evaluation is
one of these points – if the assumption of error independence is true.
The exact algebraic results have a unique virtue that few alarm systems
have - it can warn about the failures of its own assumption of error
independence. If the two possible solutions for the ‘a’ label prevalence
return an unresolved integer square root - the classifiers are error
correlated in the evaluation.
In version 0.2 the math needed to take handle the almost certain detection
of error correlation will be added. It is already being built as
can be seen in ntqr.r2.postulates file where the postulates related to
computing the error correlation have been expressed using the SymPy
package.
Warnings
———
A. The ntqr package uses a notion of ‘error independence’ that is
different than the one most familiar in the ML/AI community. There are
many notions of independence in mathematics. In the context of ML/AI
papers/discussions, the term ‘error independence’ is taken to be
A.1. Functional independence of distributions (P(x, y) = P(x)P(y))
The one used in the ntqr package is sample defined since there is no
probability theory used in its logic. For that reason, you must define a
set of error correlation parameters. ‘Error independence’ in the ntqr
package means
A.2. pair_label_correlations = 0, trio_label_correlations = 0, …
It is best to think of ‘error independence’ in the ntqr package as a
property that belongs to the classifiers AND the test they took.
B. This class currently assumes that the observed classifier
vote counts supplied by the user are not fake. The set of all valid
observations from a classification test is much smaller than the set
of all sets of eight positive integers. Future versions of the ntqr
package will implement the algebraic geometry needed to detect when
TrioVoteCounts objects are not explainable as observations from a
classification test.
The error independent solution can fail if, in fact, the classifiers
are highly correlated on the test being evaluated. Tests can fail.
Future versions will have implemented the exceptions. –
PrevalenceImaginaryException
NoSolutionException
The PrevalenceImaginaryException is a iron-clad detection of highly
correlated classifiers. Its main utility will be in “warning light”
applications in AI safety.
The NoSolutionException means that NO independent system can possibly
explain the observations. There are two different reasons for this -
higher error correlations, or the data sketch is fake. Distinguishing
between the two comes down to the same computation of error correlation.
- vote_counts¶
- vote_frequencies¶
- evaluation_exact¶
- evaluation_float¶
- alpha_prevalence_quadratic_terms()¶
Calculate the coefficients of the ‘a’ label prevalence quadratic.
- If the quadratic is represented as:
a * (P_a)**2 + b * P_a + c
- then,
a = terms[2], b = terms[1], c = terms[0].
The quadratic is written in the ‘standard’ way seen in algebra textbooks. Be careful to not mistake the ‘a’ or ‘b’ coefficients described here with the two labels being used for classification - currently implemented as (‘a’, ‘b’).
- alpha_prevalence_estimates()¶
Calculate the prevalence of the alpha label.
Since the quadratic equation has ordered solutions by the plus/minus operations, we arbitrarily return the ‘a’ label less than 50% solution first.
- classifier_a_label_accuracy(classifier: int, a_prevalence)¶
Calculate classifier ‘a’ label accuracies.
- Parameters:
classifier (int) – One of (0, 1, 2).
a_prevalence (Sympy expression)
- Return type:
The a label accuracy given the a_prevalence value
- classifier_b_label_accuracy(classifier: int, a_prevalence)¶
Calculate classifier ‘b’ label accuracies.
- Parameters:
classifier (int) – One of (0, 1, 2).
- Returns:
Two possible logically consistent estimates for P_{i,b} given the
test error independence assumption.
- class ntqr.r2.evaluators.MajorityVotingEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)¶
Evaluate three binary classifiers using majority voting.
Majority voting can be used to carry out evaluation algebraically. Typically, majority voting is used with the assumption that the crowd is always right. In the context of safety, however, that the crowd is always wrong is an equally valid a-priori assumption. Hence, this class returns TWO evaluations. The first assuming the crowd is always right and the second assuming they are always wrong. Its main virtue is that it is simple and rock solid - always returns logically consistent evaluations.
- vote_patterns¶
- vote_counts¶
- vote_frequencies¶
- labels = ('a', 'b')¶
- majority_right_vote_patterns¶
- majority_wrong_vote_patterns¶
- evaluation_exact¶
- evaluation_float¶
- compute_vote_pattern_evaluation(vote_patterns, flip)¶
- prevalences(vote_patterns)¶
Compute label prevalences in the test.
- classifier_label_accuracy(classifier, vote_patterns, label, flip)¶
Compute the label accuracy for classifier.
- to_float(sol)¶
- ntqr.r2.evaluators.data_sketch¶