Anthony Repetto
Nov 29, 2022

--

If you instantiated each item as 90deg rotations between the usual basis vectors, such that a 'cat' would automatically exist as 'seeing a cat' / 'remembering a cat' / 'goal is a cat' / 'constraint is avoid cat' / ... then, yes, multiple attention heads!

However, that nuance of 'rotation from one basis to the next' means that each attention is an exact map of the others, and the operation to convert among them is simple, while still being in a different space. I was only hoping this might inspire a niche speed-up for transformers, not proposing that it would alter capabilities significantly. Just "huh, the brain might be doing things more efficiently by layering attention of 'see cat' and 'remember cat' *orthogonal* to each other in the activation-space."

--

--

Anthony Repetto
Anthony Repetto

No responses yet