ntqr¶
Evaluate noisy decision makers using logic and algebra.
Classes:
Functions:
Misc variables:
__version__ uci_adult_test_example
Submodules¶
Attributes¶
Classes¶
Data class for the by-label aligned votes of three binary classifiers. |
|
Data class to validate the test counts for three binary classifiers. |
|
Evaluate three binary classifiers assuming they are error independent. |
|
Evaluate three binary classifiers using majority voting. |
|
Evaluation for experiments where the true labels are known. |
|
Label object to guarantee a label is stringifiable. |
|
Labels used in test question responses. The NTQR package assumes that |
Functions¶
Package Contents¶
- ntqr.__version__ = '0.8'¶
- ntqr.uciadult_label_counts: ntqr.r2.datasketches.LabelVoteCounts¶
- class ntqr.TrioLabelVoteCounts(label_vote_counts)¶
Data class for the by-label aligned votes of three binary classifiers.
This class is only useful in an experimental setting where one has observed a test with labeled data. Initialized with a Mapping[Label, Mapping[Votes, int]] of the form {
‘a’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}, ‘b’:{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}
}
DEPRECATED: This clas is being replaced with the upcoming LabelVoteCounts for an arbitrary number of classifiers.
- vote_patterns¶
- pairs = ((0, 1), (0, 2), (1, 2))¶
- label_vote_counts¶
- to_vote_counts() VoteCounts¶
Turn by-label counts into by-vote-pattern counts.
- Using {‘a’:{…, (‘a’, ‘b’, ‘a’): x, …},
‘b’:{…, (‘a’, ‘b’, ‘a’): y, …}}
- Returns:
{…, (‘a’, ‘b’, ‘a’)
- Return type:
x+y, …}
- to_TrioVoteCounts()¶
Return TrioVoteCounts object by summing votes across labels.
- to_voting_frequency_fractions() VoteFrequencies¶
Compute observed voting pattern frequencies.
- Return type:
Mapping[Votes, Fraction]
- to_voting_frequencies_float() Mapping[Votes, float]¶
Compute observed voting frequencies inexactly, as floats.
- Return type:
Mapping[Votes, float]
- class ntqr.TrioVoteCounts¶
Data class to validate the test counts for three binary classifiers.
- Initialized with a Mapping[Votes, int] of the form:
{(‘a’, ‘a’, ‘a’): int, …, (‘b’, ‘b’, ‘b’):int}
This is the class that is used for evaluation on unlabeled data where we only have access to the aligned decisions of the binary classifiers and have no knowledge of label of any one item that was classified.
DEPRECATED: This class will be replaced with the ObservedVoteCounts class that can handle an arbitrary number of classifiers.
- vote_patterns¶
- pairs = ((0, 1), (0, 2), (1, 2))¶
- vote_counts: VoteCounts¶
- __post_init__()¶
Check we have counts for a valid evaluation of binary classifiers.
No negative counts.
Initialize all possible vote patterns by the trio.
The empty test - all counts zero - is not allowed.
- to_frequencies_exact() VoteFrequencies¶
Turn vote integer counts to exact Fraction objects.
- Returns:
Maps a trio vote pattern to its Fraction occurence in the test.
- Return type:
VoteFrequencies
- to_frequencies_float() Mapping[Votes, float]¶
Compute observerd voting pattern frequencies, inexactly, as floats.
- Returns:
Maps a trio vote pattern to its percentage occurence in the test as an inexact float.
- Return type:
Mapping[Votes, float]
- classifier_label_frequency(classifier: int, label: Label) sympy.Rational¶
Calculate classifier label voting frequency.
- Parameters:
classifier (int) – The index of the classifier.
label (Label) – The label.
- Returns:
The fraction of times the classifier voted the label when classifying items in the test.
- Return type:
sympy.Rational(label_vote_counts, test_size)
- classifier_label_responses(classifier: int, label: Label) int¶
Calculates number of responses with label by classifier.
- Parameters:
classifier (int) – DESCRIPTION.
label (Label) – DESCRIPTION.
- Returns:
Number of times the classifier decided an item was label.
- Return type:
int
- pair_label_frequency(pair: Iterable[int], label: Label) sympy.Rational¶
Compute frequency of times a pair voted with the same label.
- Parameters:
pair (Iterable[int, int]) – Classifier indicies.
label (Label) – The label.
- Returns:
The fraction of times a pair of classifiers voted with the same label when classifying items in the test.
- Return type:
sympy.Rational
- pair_label_responses(pair: Iterable[int], label: Label) sympy.Rational¶
Computes number of times a pair voted with the same label.
- Parameters:
pair (Iterable[int, int]) – Classifier indicies.
label (Label) – The label.
- Returns:
Number of items pair voted with the same label.
- Return type:
int
- pair_frequency_moment(pair: Iterable[int], label: Label) sympy.Rational¶
Calculate the label classifier pair frequency moment.
If (i, j) = pair, then this is -
f_{label_i, label_j} - f_{label_i} * f_{label_j}
The fraction of times the classifier pair voted with the same label minus the product of their individual label voting frequencies.
- Parameters:
pair (Iterable(int, int)) – The pair of classifiers.
label (Label) – One of the binary labels.
- Returns:
The pair frequency moment as a Fraction object
- Return type:
sympy.Rational
- label_pairs_frequency_moments(label: Label) Mapping[tuple[int, int], sympy.Rational]¶
All the label pair frequency moments.
- trio_frequency_moment() sympy.Rational¶
Calculate the 3rd frequency moment of a trio of binary classifiers.
Don’t ask.
- class ntqr.ErrorIndependentEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)¶
Evaluate three binary classifiers assuming they are error independent.
- Returns:
Absent labeled data, there are two logically consistent solutions
given only their decision voting frequencies. For binary classification,
this means that there are 2 possible points in evaluation space that
can possibly explain the test results. The ground truth evaluation is
one of these points – if the assumption of error independence is true.
The exact algebraic results have a unique virtue that few alarm systems
have - it can warn about the failures of its own assumption of error
independence. If the two possible solutions for the ‘a’ label prevalence
return an unresolved integer square root - the classifiers are error
correlated in the evaluation.
In version 0.2 the math needed to take handle the almost certain detection
of error correlation will be added. It is already being built as
can be seen in ntqr.r2.postulates file where the postulates related to
computing the error correlation have been expressed using the SymPy
package.
Warnings
———
A. The ntqr package uses a notion of ‘error independence’ that is
different than the one most familiar in the ML/AI community. There are
many notions of independence in mathematics. In the context of ML/AI
papers/discussions, the term ‘error independence’ is taken to be
A.1. Functional independence of distributions (P(x, y) = P(x)P(y))
The one used in the ntqr package is sample defined since there is no
probability theory used in its logic. For that reason, you must define a
set of error correlation parameters. ‘Error independence’ in the ntqr
package means
A.2. pair_label_correlations = 0, trio_label_correlations = 0, …
It is best to think of ‘error independence’ in the ntqr package as a
property that belongs to the classifiers AND the test they took.
B. This class currently assumes that the observed classifier
vote counts supplied by the user are not fake. The set of all valid
observations from a classification test is much smaller than the set
of all sets of eight positive integers. Future versions of the ntqr
package will implement the algebraic geometry needed to detect when
TrioVoteCounts objects are not explainable as observations from a
classification test.
The error independent solution can fail if, in fact, the classifiers
are highly correlated on the test being evaluated. Tests can fail.
Future versions will have implemented the exceptions. –
PrevalenceImaginaryException
NoSolutionException
The PrevalenceImaginaryException is a iron-clad detection of highly
correlated classifiers. Its main utility will be in “warning light”
applications in AI safety.
The NoSolutionException means that NO independent system can possibly
explain the observations. There are two different reasons for this -
higher error correlations, or the data sketch is fake. Distinguishing
between the two comes down to the same computation of error correlation.
- vote_counts¶
- vote_frequencies¶
- evaluation_exact¶
- evaluation_float¶
- alpha_prevalence_quadratic_terms()¶
Calculate the coefficients of the ‘a’ label prevalence quadratic.
- If the quadratic is represented as:
a * (P_a)**2 + b * P_a + c
- then,
a = terms[2], b = terms[1], c = terms[0].
The quadratic is written in the ‘standard’ way seen in algebra textbooks. Be careful to not mistake the ‘a’ or ‘b’ coefficients described here with the two labels being used for classification - currently implemented as (‘a’, ‘b’).
- alpha_prevalence_estimates()¶
Calculate the prevalence of the alpha label.
Since the quadratic equation has ordered solutions by the plus/minus operations, we arbitrarily return the ‘a’ label less than 50% solution first.
- classifier_a_label_accuracy(classifier: int, a_prevalence)¶
Calculate classifier ‘a’ label accuracies.
- Parameters:
classifier (int) – One of (0, 1, 2).
a_prevalence (Sympy expression)
- Return type:
The a label accuracy given the a_prevalence value
- classifier_b_label_accuracy(classifier: int, a_prevalence)¶
Calculate classifier ‘b’ label accuracies.
- Parameters:
classifier (int) – One of (0, 1, 2).
- Returns:
Two possible logically consistent estimates for P_{i,b} given the
test error independence assumption.
- class ntqr.MajorityVotingEvaluation(vote_counts: ntqr.r2.datasketches.TrioVoteCounts)¶
Evaluate three binary classifiers using majority voting.
Majority voting can be used to carry out evaluation algebraically. Typically, majority voting is used with the assumption that the crowd is always right. In the context of safety, however, that the crowd is always wrong is an equally valid a-priori assumption. Hence, this class returns TWO evaluations. The first assuming the crowd is always right and the second assuming they are always wrong. Its main virtue is that it is simple and rock solid - always returns logically consistent evaluations.
- vote_patterns¶
- vote_counts¶
- vote_frequencies¶
- labels = ('a', 'b')¶
- majority_right_vote_patterns¶
- majority_wrong_vote_patterns¶
- evaluation_exact¶
- evaluation_float¶
- compute_vote_pattern_evaluation(vote_patterns, flip)¶
- prevalences(vote_patterns)¶
Compute label prevalences in the test.
- classifier_label_accuracy(classifier, vote_patterns, label, flip)¶
Compute the label accuracy for classifier.
- to_float(sol)¶
- class ntqr.SupervisedEvaluation(label_counts: ntqr.r2.datasketches.TrioLabelVoteCounts)¶
Evaluation for experiments where the true labels are known.
- vote_patterns¶
- pairs = ((0, 1), (0, 2), (1, 2))¶
- label_counts¶
- evaluation_exact¶
- evaluation_float¶
- prevalences()¶
Calculate the prevalences of the two labels.
- Returns:
Mapping from labels to percentage of appearance in the test.
- Return type:
Mapping[Label, Fraction]
- classifier_label_accuracy(classifier: int, label: ntqr.r2.datasketches.Label)¶
Compute classifier label accuracy.
- other_label(label: ntqr.r2.datasketches.Label)¶
Return the other binary classification label given label.
- pair_label_error_correlation(pair, label)¶
Calculate the label error correlation a classifier pair.
- three_way_label_error_correlation(triplet, label)¶
Calculate the label error correlation a classifier pair.
- class ntqr.Label(label)¶
Bases:
strLabel object to guarantee a label is stringifiable.
- _label¶
- __str__()¶
Return str(self).
- class ntqr.Labels(labels: Iterable[str])¶
Bases:
tupleLabels used in test question responses. The NTQR package assumes that all test questions have the same, fixed, label set. These are a tuple-like object so the user can specify the canonical order of the labels.
The number of labels defines the integer R in the acronym NTQR. Binary classification: R=2 Three label classification: R=3, etc.
- _labels¶
- ntqr.copy_notebooks()¶