Anthony Repetto
1 min readNov 18, 2021

--

Erm, I never claimed to have a superior method; I only identified a path which *might* be worth exploring. Prior to the equations which so succinctly demonstrate an idea, there is first the *thought* that it may be possible. If you need all the equations before you can begin, then you should stick to implementing others' equations. However, if you can adapt a *concept* to suite your needs, that is what I offered.

For clarity, the relationship between vectors is a simple quarter-turn along each of the basis dimensions. It merely allows you to distinguish between an instance of a cat that you *currently observe* as distinct from a *memory* of a cat or a cat as a *goal*, etc. Transformers allocate attention according to vector similarity; this quarter-turned vector would have zero similarity, so it would *not* be confused with the separate instance-type - until you rotate the set of attention keys in the appropriate direction. All objects "currently observed" can be attended-to distinctly from all the objects "being remembered". I'm sure you can construct a model from that description, and no proof of convergence or re-derivation of back-prop is necessary.

--

--

Anthony Repetto
Anthony Repetto

No responses yet