ntqr.alarms¶

Algorithms for logical alarms based on the axioms.

Formal verification of unsupervised evaluations is carried out by using the agreements and disagreements between classifiers to detect if they are misaligned given a safety specification.

The ‘atomic’ logical test for the alarms is a look at the group evaluations that are logically consistent with how the classifiers aligned in their decisions and an assumed number of corrects for each label in the true, but unknown, answer key for the exam.

For example, in a test with three possible responses or classes for each question, we need to specify,

qs = (q_label_1, q_label_2, q_label_3)

where

sum(qs) = Q,

with Q the size of the test. So a test with Q=10, could have a qs setting of, (5,3,2) since sum(5,3,2) = 10.

This atomic misalignment test at fixed qs value then allows you to create custom alarms depending on your application domain. Some examples,

1. The prevalence of classes in your tests is biased toward small amounts of one label or the other. In that case, you can construct an alarm as,

all([alarm.misaligned_at_qs(qs, responses) for qs my_range()])

2. The method ntqr.SingleClassifierAxiomsAlarm.are_misaligned is a test for fully unsupervised settings and is equivalent to,

all([alarm.misaligned_at_qs((qa,Q-qa), rs) for qa in range(0,Q+1)])

That is, the only thing you have are the classifiers’ responses and the size of the test, Q.

3. You believe that your classifiers are high performing and therefore will only accept (Q_label_1, Q_label_2, …) settings for which all your classifiers are better than x% at detecting all the labels. This turns the atomic logical test into a measuring instrument for the prevalence of the labels in the tested dataset. The method

SingleClassifierAxiomsAlarm.are_misaligned( responses )

is the fully unsupervised version of what logical alarms can do. It detects (imperfectly!) if at least one member in an ensemble is violating a user provided safety specification when doing classification with R classes.

The name ‘are_misaligned’ should make clear that this detects when classifiers are misaligned and this is not the same thing as being correct. If a pair of classifiers are being tested, if both are wrong in the same way, .are_misaligned will return False.

The user is encouraged to think of these alarms as building blocks for algorithms that use the philosophy of error-detecting codes. For example, by having three classifiers, as long as one of them is behaving correctly, .are_misaligned will return True.

Classes¶

`SingleClassifierAxiomsAlarm`	Alarm based on the single classifier axioms for the ensemble members.
`LabelsSafetySpecification`	Simple example of safety specification for each label.
`GradeSafetySpecification`	Simple example of a grade safety specification.

Module Contents¶

class ntqr.alarms.SingleClassifierAxiomsAlarm(Q: int, classifiers_axioms: collections.abc.Sequence[ntqr.r2.raxioms.SingleClassifierAxioms | ntqr.r3.raxioms.SingleClassifierAxioms], cls_single_evals: ntqr.evaluations.SingleClassifierEvaluations)¶

Alarm based on the single classifier axioms for the ensemble members.

Although this alarm considers only single classifier axioms, they all share the variables related to the number of different question types in a test. For example, a binary test has two question types. This allows us to consider what evaluations are possible for a group of classifiers at fixed number of questions.

Said another way, when we only consider the individual number of responses for each classifier, we are aligning the group responses on the whole test, not individual questions in it. Future classes will consider what happens when we count how pairs of them are aligned at the question level.

Q¶

classifiers_axioms¶

labels¶

evals¶

set_safety_specification(factors: collections.abc.Sequence[int]) → None¶

Set alarm’s safetySpecification given factors.

Currently defaulting to LabelsSafetySpecification

Parameters:: factors (Sequence[int]) – Sequence of factors that will satisfy factor*q_l_correct - q_l > 0
Return type:: None

misaligned_at_qs(qs: collections.abc.Sequence[int], responses: collections.abc.Sequence[collections.abc.Sequence[int]]) → bool¶

Tests if responses are misaligned at qs.

Parameters:

qs (Sequence[int]) – Count of label in answer key.
responses (Sequence[Sequence[int]]) – Label responses by each classifier

Returns:

Whether one or more classifiers violated the safety specification.

Return type:

bool

misalignment_trace(responses: collections.abc.Sequence[collections.abc.Sequence[int]]) → set[tuple[collections.abc.Sequence[int], bool]]¶

Test classifiers misalignment at all label question numbers.

Parameters:: responses (Sequence[Sequence[int]]) – The number of label responses by each classifier
Returns:: The set of (qs, misalignment_test_result) for all possible qs settings in a test of size Q.
Return type:: set[tuple[Sequence[int], bool]]

are_misaligned(responses: collections.abc.Sequence[collections.abc.Sequence[int]]) → bool¶

Boolean AND of the misalignment trace given responses.

Parameters:: responses (Sequence[Sequence[int]]) – The number of label responses by each classifier
Returns:: True if the classifiers have no qs setting at which all classifiers satisfy the safety specification, False otherwise.
Return type:: bool

check_responses(qs: collections.abc.Sequence[int], responses: collections.abc.Sequence[collections.abc.Sequence[int]]) → bool¶

Check logical constraints on responses.

The sum of label correct questions equals the size of the test.

sum(qs) = Q
All classifiers label responses also sum to the test size.

all( (sum(classifer_rsps) == Q) for classifier_rsps in responses)

Parameters:

qs (Sequence[int]) – Count of label in answer key.
responses (Sequence[Sequence[int]]) – The number of label responses by each classifier

Returns:

True if requirements 1 and 2 are satisfied, False otherwise.

Return type:

bool

class ntqr.alarms.LabelsSafetySpecification(factors: collections.abc.Sequence[int])¶

Simple example of safety specification for each label.

factors¶

is_satisfied(qs: collections.abc.Sequence[int], correct_responses: collections.abc.Sequence[int])¶

Check correct_responses at qs setting satisfy safety specification

Parameters:

qs (Sequence[int]) – Count of label in answer key.
responses (Sequence[Sequence[int]]) – The number of label responses by each classifier

Returns:

True if classifier assumed number of correct responses satisfy the safety specification, False otherwise. Each number of assumed label correct responses must satisfy factor*q_label_correct - q_label > 0.

Return type:

bool

pair_safe_evaluations_at_qs(qs: collections.abc.Sequence[int]) → list[collections.abc.Iterator[tuple[int, int]]]¶

All pair evaluations satisfying safety spec at given qs.

Parameters:: qs (Sequence[int]) – Number of questions for each label.
Returns:: List of iterators, one per label, for the pair evaluations that satisfy the safety specification.
Return type:: list[Iterator[tuple[int,int]]]

class ntqr.alarms.GradeSafetySpecification(factors)¶

Simple example of a grade safety specification.

factors¶

is_satisfied(qs: list[int], correct_responses: collections.abc.Sequence[int])¶

Checks that list of label accuracies satisfy the safety specification.

Parameters:

qs (list(int)) – Number of label questions in the test.
correct_responses (Sequence[int]) – Number of label correct responses, one per label.

Return type:

Boolean

pair_safe_evaluations_at_qs(qs)¶: All pair evaluations satisfying safety spec at given qs.