Positional Encoding in Transformer

With the advancement of large language models (LLMs), the significance of the context length they can handle is increasingly apparent. Let’s take a look at the evolution of positional encoding over the years to enhance the context processing capability of LLMs. Vanilla Positional Encoding Why does Transformer need positional encoding? Actually, Transformer contains no recurrence and no convolution. To help the model to ultilize the order of the sequence, Vanilla Transformer (vaswani et al....

March 10, 2023 · 2 min · Loong