YaRN: Efficient Context Window Extension of Large Language Models
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models
fail to generalize past the sequence length they were trained on.