Neural Networks: Orthogonal Transformers

1 min readJun 28, 2021

Alexandra Libby and Timothy Buschman found an important fact about our brains: they recognize memories as distinct from present sensory stimuli by making certain neurons’ activations ‘orthogonal’. That is, certain measures of activity were ‘at right angles to each other’. NOT ‘opposites along the same line’; these signals are a swapping of activity between groups of neurons, to veer in a new direction, uncluttered.

The Transformer architecture, in all its forms, relies upon a measurement of vector similarity to decide where it pays attention. My stumbling insight: use orthogonal vectors in the Transformer, to encode different ‘states’ of the same sensory information. One vector matches present stimuli — while an orthogonal vector represents a memory of that same stimuli, another rotation represents expectations, and further rotations can encode numerous aspects of the data, such as conditionals, goals, etc. (So, each quadrant of the n-dimensional space has its own copy of each object’s vector, rotated into that quadrant to represent that quadrant’s ‘kind’ of data or attention.)

Let Transformers listen to each vector that is aligned OR orthogonal, to encode myriad types of attention, overlapping, without impeding each other. Neuroscientists say its how we avoid forgetting short-term memories, so it might be worth a try! :)

Neural Networks: Orthogonal Transformers

Written by Anthony Repetto