ntqr.r2.datasketches
====================

.. py:module:: ntqr.r2.datasketches

.. autoapi-nested-parse::

   @author: Andrés Corrada-Emmanuel.


Attributes
----------

.. autoapisummary::

   ntqr.r2.datasketches.Label
   ntqr.r2.datasketches.Votes
   ntqr.r2.datasketches.VoteCounts
   ntqr.r2.datasketches.LabelVoteCounts
   ntqr.r2.datasketches.VoteFrequencies
   ntqr.r2.datasketches.opposite_label


Classes
-------

.. autoapisummary::

   ntqr.r2.datasketches.TrioLabelVoteCounts
   ntqr.r2.datasketches.TrioVoteCounts
   ntqr.r2.datasketches.ObservedVoteCounts


Functions
---------

.. autoapisummary::

   ntqr.r2.datasketches.classifier_label_votes
   ntqr.r2.datasketches.votes_match
   ntqr.r2.datasketches.classifiers_labels_votes


Module Contents
---------------

.. py:data:: Label

.. py:data:: Votes

.. py:data:: VoteCounts

.. py:data:: LabelVoteCounts

.. py:data:: VoteFrequencies

.. py:data:: opposite_label

.. py:function:: classifier_label_votes(classifier: int, label: Label, vote_patterns: Iterable[Votes]) -> tuple[Votes, Ellipsis]

   All the vote patterns where classifier voted with label.

   :param classifier: Index of the classifier in the vote_patterns.
   :type classifier: int
   :param label: The classifier label vote.
   :type label: Label

   :returns: All the vote patterns where classifier voted with label.
   :rtype: tuple[Votes, ...]


.. py:function:: votes_match(votes: tuple[Label, Ellipsis], classifiers: Iterable[int], labels: Iterable[Label]) -> bool

   Test if labels by classifiers matches votes.

   :param votes: Vote pattern for the trio.
   :type votes: Tuple[Label, ...]
   :param classifiers: Classifiers to check.
   :type classifiers: Iterable[int]
   :param labels: Label voted by each classifier - the voting
                  sub-pattern we want a trio pattern to match.
   :type labels: Iterable[Label]

   :rtype: bool.


.. py:function:: classifiers_labels_votes(classifiers: Iterable[int], labels: Iterable[Label], vote_patterns: Iterable[Votes]) -> tuple[Votes, Ellipsis]

   Trio voting patterns that match labels by classifiers.

   :param classifiers: The indices of the classifiers in the trio
                       to match.
   :type classifiers: Iterable[int]
   :param labels: Voting sub-pattern by classifiers to match.
   :type labels: Iterable[Label]

   :returns: Tuple of trio voting patterns that matched labels
             for the classifiers.
   :rtype: tuple[Votes, ...]


.. py:class:: TrioLabelVoteCounts(label_vote_counts)

   Data class for the by-label aligned votes of three binary classifiers.

   This class is only useful in an experimental setting where one has
   observed a test with **labeled** data.
   Initialized with a Mapping[Label, Mapping[Votes, int]] of the form
   {
    'a':{('a', 'a', 'a'): int, ..., ('b', 'b', 'b'):int},
    'b':{('a', 'a', 'a'): int, ..., ('b', 'b', 'b'):int}
   }

   DEPRECATED: This clas is being replaced with the upcoming LabelVoteCounts
   for an arbitrary number of classifiers.


   .. py:attribute:: vote_patterns


   .. py:attribute:: pairs
      :value: ((0, 1), (0, 2), (1, 2))


   .. py:attribute:: label_vote_counts


   .. py:method:: to_vote_counts() -> VoteCounts

      Turn by-label counts into by-vote-pattern counts.

      Using {'a':{..., ('a', 'b', 'a'): x, ...},
             'b':{..., ('a', 'b', 'a'): y, ...}}

      :returns: **{..., ('a', 'b', 'a')**
      :rtype: x+y, ...}


   .. py:method:: to_TrioVoteCounts()

      Return TrioVoteCounts object by summing votes across labels.


   .. py:method:: to_voting_frequency_fractions() -> VoteFrequencies

      Compute observed voting pattern frequencies.

      :rtype: Mapping[Votes, Fraction]


   .. py:method:: to_voting_frequencies_float() -> Mapping[Votes, float]

      Compute observed voting frequencies inexactly, as floats.

      :rtype: Mapping[Votes, float]


   .. py:method:: __getitem__(label)

      Return the vote pattern counts for label.

      :param label: One of 'a' or 'b'.
      :type label: Label

      :returns: The aligned vote counts observed for the given label.
      :rtype: Mapping[Votes, int]


   .. py:method:: flip_classifiers_label_decisions(classifiers: Iterable, label: Label)


.. py:class:: TrioVoteCounts

   Data class to validate the test counts for three binary classifiers.

   Initialized with a Mapping[Votes, int] of the form:
       {('a', 'a', 'a'): int, ..., ('b', 'b', 'b'):int}

   This is the class that is used for evaluation on unlabeled data where
   we only have access to the aligned decisions of the binary classifiers
   and have no knowledge of label of any one item that was classified.

   DEPRECATED: This class will be replaced with the ObservedVoteCounts class
   that can handle an arbitrary number of classifiers.


   .. py:attribute:: vote_patterns


   .. py:attribute:: pairs
      :value: ((0, 1), (0, 2), (1, 2))


   .. py:attribute:: vote_counts
      :type:  VoteCounts


   .. py:method:: __post_init__()

      Check we have counts for a valid evaluation of binary classifiers.

      1. No negative counts.
      2. Initialize all possible vote patterns by the trio.
      3. The empty test - all counts zero - is not allowed.


   .. py:method:: to_frequencies_exact() -> VoteFrequencies

      Turn vote integer counts to exact Fraction objects.

      :returns: Maps a trio vote pattern to its Fraction occurence in the test.
      :rtype: VoteFrequencies


   .. py:method:: to_frequencies_float() -> Mapping[Votes, float]

      Compute observerd voting pattern frequencies, inexactly, as floats.

      :returns: Maps a trio vote pattern to its percentage occurence in the test
                as an inexact float.
      :rtype: Mapping[Votes, float]


   .. py:method:: classifier_label_frequency(classifier: int, label: Label) -> sympy.Rational

      Calculate classifier label voting frequency.

      :param classifier: The index of the classifier.
      :type classifier: int
      :param label: The label.
      :type label: Label

      :returns: The fraction of times the classifier voted the label when
                classifying items in the test.
      :rtype: sympy.Rational(label_vote_counts, test_size)


   .. py:method:: classifier_label_responses(classifier: int, label: Label) -> int

      Calculates number of responses with label by classifier.

      :param classifier: DESCRIPTION.
      :type classifier: int
      :param label: DESCRIPTION.
      :type label: Label

      :returns: Number of times the classifier decided an item was label.
      :rtype: int


   .. py:method:: pair_label_frequency(pair: Iterable[int], label: Label) -> sympy.Rational

      Compute frequency of times a pair voted with the same label.

      :param pair: Classifier indicies.
      :type pair: Iterable[int, int]
      :param label: The label.
      :type label: Label

      :returns: The fraction of times a pair of classifiers voted with the
                same label when classifying items in the test.
      :rtype: sympy.Rational


   .. py:method:: pair_label_responses(pair: Iterable[int], label: Label) -> sympy.Rational

      Computes number of times a pair voted with the same label.

      :param pair: Classifier indicies.
      :type pair: Iterable[int, int]
      :param label: The label.
      :type label: Label

      :returns: Number of items pair voted with the same label.
      :rtype: int


   .. py:method:: pair_frequency_moment(pair: Iterable[int], label: Label) -> sympy.Rational

      Calculate the label classifier pair frequency moment.

      If (i, j) = pair, then this is -

          f_{label_i, label_j} - f_{label_i} * f_{label_j}

      The fraction of times the classifier pair voted with the same label
      minus the product of their individual label voting frequencies.

      :param pair: The pair of classifiers.
      :type pair: Iterable(int, int)
      :param label: One of the binary labels.
      :type label: Label

      :returns: The pair frequency moment as a Fraction object
      :rtype: sympy.Rational


   .. py:method:: label_pairs_frequency_moments(label: Label) -> Mapping[tuple[int, int], sympy.Rational]

      All the label pair frequency moments.


   .. py:method:: trio_frequency_moment() -> sympy.Rational

      Calculate the 3rd frequency moment of a trio of binary classifiers.

      Don't ask.


.. py:class:: ObservedVoteCounts

   Data class to validate the test vote counts for an arbitrary number
   of binary classifiers.

   Initialized with a Mapping[Votes, int] of the form:
       {('a', ..., a'): int, ..., ('b', ..., 'b'):int}

   Class used during evaluations with unlabeled data where we do not
   know the true label for each item labeled by the classifiers.


   .. py:attribute:: vote_counts
      :type:  VoteCounts


   .. py:method:: __post_init__()

      Check we have counts for a valid evaluation of binary classifiers.

      0. Determine the size of the ensemble and that all provided
         vote patterns conform to it.
      1. No negative counts.
      2. Initialize all possible vote patterns by the trio.
      3. The empty test - all counts zero - is not allowed.


   .. py:method:: to_frequencies_exact() -> VoteFrequencies

      Turn vote integer counts to exact Fraction objects.

      :returns: Maps a trio vote pattern to its Fraction occurence in the test.
      :rtype: VoteFrequencies


   .. py:method:: to_frequencies_float() -> Mapping[Votes, float]

      Compute observerd voting pattern frequencies, inexactly, as floats.

      :returns: Maps a trio vote pattern to its percentage occurence in the test
                as an inexact float.
      :rtype: Mapping[Votes, float]


   .. py:method:: classifier_label_frequency(classifier: int, label: Label) -> sympy.Rational

      Calculate classifier label voting frequency.

      :param classifier: The index of the classifier.
      :type classifier: int
      :param label: The label.
      :type label: Label

      :returns: The fraction of times the classifier voted the label when
                classifying items in the test.
      :rtype: sympy.Rational(label_vote_counts, test_size)


   .. py:method:: classifier_label_responses(classifier: int, label: Label) -> int

      Calculates number of responses with label by classifier.

      :param classifier: DESCRIPTION.
      :type classifier: int
      :param label: DESCRIPTION.
      :type label: Label

      :returns: Number of times the classifier decided an item was label.
      :rtype: int


   .. py:method:: pair_label_frequency(pair: Iterable[int], label: Label) -> sympy.Rational

      Compute frequency of times a pair voted with the same label.

      :param pair: Classifier indicies.
      :type pair: Iterable[int, int]
      :param label: The label.
      :type label: Label

      :returns: The fraction of times a pair of classifiers voted with the
                same label when classifying items in the test.
      :rtype: sympy.Rational


   .. py:method:: pair_label_responses(pair: Iterable[int], label: Label) -> sympy.Rational

      Computes number of times a pair voted with the same label.

      :param pair: Classifier indicies.
      :type pair: Iterable[int, int]
      :param label: The label.
      :type label: Label

      :returns: Number of items pair voted with the same label.
      :rtype: int


   .. py:method:: pair_frequency_moment(pair: Iterable[int], label: Label) -> sympy.Rational

      Calculate the label classifier pair frequency moment.

      If (i, j) = pair, then this is -

          f_{label_i, label_j} - f_{label_i} * f_{label_j}

      The fraction of times the classifier pair voted with the same label
      minus the product of their individual label voting frequencies.

      :param pair: The pair of classifiers.
      :type pair: Iterable(int, int)
      :param label: One of the binary labels.
      :type label: Label

      :returns: The pair frequency moment as a Fraction object
      :rtype: sympy.Rational


   .. py:method:: label_pairs_frequency_moments(label: Label) -> Mapping[tuple[int, int], sympy.Rational]

      All the label pair frequency moments.


   .. py:method:: trio_frequency_moment() -> sympy.Rational

      Calculate the 3rd frequency moment of a trio of binary classifiers.

      Don't ask.