How many independent evaluation axioms are there?¶
This notebook counts how many independent evaluation axioms there are for an ensemble of \(N\) classifiers. There are three sets of equations that define the logic of unsupervised evaluation for classifiers. The first two sets of equations (called an ideal in algebraic geometry) define the set of all possible evaluations for a test of size \(Q\). These are the evaluations before we observe any test results.
The third set of equations use the counts of how the classifiers agreed and disagreed on the test. For a classification test with \(R\) possible labels, there are \(R^N\) ways that classifiers can do this when labeling a single item. Adding the third set of equations to the first two then gives the restricted set of the group evaluations that are logically consistent with the test results.
We will go over these set of equations and use linear algebra to count how many of these axioms are independent. Note that this is very different from asking if the classifiers where independent in their errors during a test. That is a different independence – classifier independence. This notebook explores axiom independence.
import itertools, random
import sympy
from IPython.display import display,Math,Latex, HTML
import ntqr
import ntqr.raxioms
The simplex axioms¶
The simplex axioms are shown here. These are the equations stating that the count of all events conditioned on a true label must be equal to the appearance of that label in the answer key. This is a normalization of events statement. Given R labels and N classifiers making decisions, we have a complete description of their decision events (\(R^N\) of them). The count of a label, \(\ell\) , in the answer key is denoted by \(Q_{\ell}\).
There are \(R\) of these simplex axioms, one for each label. And they involve the \(R*R^N=R^{N+1}\) variables since \(N\) classifiers can agree/disagree in \(R^N\) possible ways.
Let’s test the count of the variables
# We have three classifiers doing three label classification
labels = ('a','b','c')
classifiers = ('i','j','k')
r_vars = ntqr.statistics.ResponseVariables(labels, classifiers)
print(r_vars)
random.choice(list(r_vars.label_responses[random.choice(labels)].values()))
ClassifiersResponseVariables(('a', 'b', 'c'),('i', 'j', 'k'))
nVars = len([var for label in labels
for var in r_vars.label_responses[label].values()])
nVars == 3**4
True
The independence of the simplex axioms¶
Let’s collect all the simplex axioms for the evaluation of \(N\) classifiers. We’ll check that the count we get for our working case of \(N=3\) and \(R=3\) has the expected count.
We then compute the rank of the matrix associated with the simplex axioms (they are all linear equations in the label response variables). Given how we constructed the simplex axioms, we expect all of them to be independent of each other. Is that so?
simplex_axioms = [axiom for axiom in ntqr.raxioms.SimplexAxioms(labels, classifiers).axioms.values()]
n_axioms = len(simplex_axioms) # returns R axioms
n_axioms == len(labels)
True
# Let's look at one of the axioms to exhibit its simple linear structure
random.choice(simplex_axioms)
# Let's use SymPy to check the independence of these equations, we expect the
# rank of the matrix of the linear coefficients to be 21.
#
# We need the vars
r_vars = [r_var for label in labels
for r_var in ntqr.statistics.ResponseVariables(labels,classifiers).label_responses[label].values()]
print(len(r_vars) == len(labels)**(len(classifiers) + 1)) # There are R^(N+1) unknown label response variables
# These are the label response variables for subsets of the classifiers
random.choice(r_vars)
True
# Let's check the rank of the coefficient matrix
A, b = sympy.linear_eq_to_matrix(list(simplex_axioms), r_vars)
A
A.rank() == 3
True
The observable axioms¶
Observable axioms involve sums across all true labels. If a pattern of decision by the classifiers occurs n times, that count is an unknown sum of non-negative integers over true labels.
An example is given by,
observable_axioms = [axiom for axiom in ntqr.raxioms.ObservableAxioms(labels,classifiers).axioms]
len(observable_axioms) # Should return R^N
27
random.choice(observable_axioms)
# How many are independent from themselves?
A, b = sympy.linear_eq_to_matrix(observable_axioms,r_vars)
print(A.rank()) # Should return R^N
A.rank() == len(labels)**(len(classifiers))
# But together, we don't get R^N + R independent axioms, we just get R^N + R - 1
A, b = sympy.linear_eq_to_matrix(simplex_axioms + observable_axioms,r_vars)
A.rank() # There are 30 equations but only 29 independent ones.
29
Before we added the observable axioms, we had 3 independent axioms (simplex axioms). There are 27 observable axioms but they contribute only 26 independent axioms.
Let’s recapitulate the dimensions and how observing test results reduces the dimensionality of the space of possible evaluations.
To describe the \(R^N\) ways they can agree and disagree on \(R\) labels we need \(R^{N+1}=81\) variables.
The dimension of the possible evaluations in the integer response space is \(R*(R^N - 1) = 78.\)
Once we have observed the results of a particular test, we can reduce the dimension by \(R^N - 1 = 26\) down to \(52.\)
In general, we can reduce the dimensionality of the space of possible evaluations by \(1/R\), or about a third for three label classification as in the running example.
The marginalization axioms¶
Given a point in the Q-simplex, once we have the set of points in the \(R\) simplexes for the \(R^N\) decisions classifiers can make, we can compute any marginalized event given true label. Again, because we are dealing with counts for discrete events, we can write these marginalization relations as simple sums. There are no probability assumptions needed for how counts should marginalize.
These mariginalization “axioms” are not needed to compute the consistent set for their \(R^N\) decisions given true label. But they come in handy as way to use linear algebra to compute marginalized simplexes by true label. Rather than coding how any marginalized count can be expressed as the sum of the full event counts, one can use linear algebra to get symbolic expressions for all of them at runtime.
This is done by creating a class that produces how \(N\) sized events marginalize to \((N-1).\) Let’s take a look.
top_marginalization_axioms = [axiom for label in labels
for axiom in ntqr.raxioms.MarginalizationAxioms(labels,classifiers).axioms[label]]
Matrix(top_marginalization_axioms)