Recent Trends with Text Embeddings: Decoder-Only LLMs
Introduction Fine-tuned BERT-style bidirectional encoders, such as E5, have long dominated text embedding models, delivering state-of-the-art performance for many sequence-level tasks at an accessible cost. However, currently, many of the top performing models on the MTEB leaderboard, a text embedding benchmark for various tasks (retrieval, clustering, classification, etc.) are decoder-only language models at the ~7B scale. These models leverage larger context windows and pretraining on web-scale data. This post reviews the papers of the following models:...