Thank you! To my knowledge, no links to papers exist — I am describing my own recent insights. I would be excited to work with others, to build and train a deep recurrent neural network using this method.

Covariance is generally found by summing all the paired products of instances’ *differences from their mean*, and dividing by the number of instances. It is measured between pairs of inputs, (x,y), over the population of pairs. That is not exactly the situation for “cat-classifiers”. So, I have used a misnomer, when the details of implementation would be messier.

For the case of a single neuron in our “cat-classifier”, there are a few comparisons which need to occur: the tendency of accurately classified cat images to cause activation of the neuron; the tendency of mis-classified cat images to cause activation of the neuron; the tendency for these two activation levels to be different. (Example: “accurately classified images all had an activation between 0.7 and 0.9. Meanwhile, misclassified images all had an activation between 0.1 and 0.3, for that same neuron. Something is different between them. This would be a good place to train-away errors.”)

Calculations?:

First, we find the mean-average activation among *accurately* classified images of cats, let us call it X, and we subtract this from the activation level of a *particular* accurately classified image, let’s call it x(i). This term, ( x(i)-X ) is then multiplied by another particular accurately classified image, minus this same mean ( x(j)-X ). If we had paired data for covariance, then this second term would be ( y(i)-Y ). Instead, notice that covariance *with itself* is really just ‘variance’: you would do fine with the sum of squares of the differences from the mean. So, compute the statistical variance (and mean) of activation levels, for *both* the set of accurately classified images *and* the erroneously classified images.

The difference between the accurate and erroneous could be formulated as a hypothesis test: “Are these two distributions different?” I don’t know if that would be better. I imagined, simply, paired comparisons between members of each group, centered around the average of the means of both… an accurate image’s activation x(i) paired with an erroneous image’s activation y(i): ( x(i)-((X+Y)/2) ) would be multiplied by ( y(i)-((X+Y)/2) ). Sum this product, for as many pairs of accurate & erroneous as you like, and divide by that number of pairings. They will result in a large negative value only if x(i) and y(i) tended to be on far opposite sides of their shared mean, ((x+Y)/2). No, that’s not a hypothesis test! It does implement covariance, though.

I am honestly still fumbling for any deeper insights around this, and there may be superior ways to collect or construct such a statistic of neuron activity. Pair-wise covariance was my metaphor, not absolutely the best method.