databricks-ml-examples

E5 Model family

E5 is a text embedding model based on Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022.

Model List

E5 models have the following variations:

Model Model Size (GB) Embedding Dimensions
intfloat/e5-large-v2 1.34 1024
intfloat/e5-base-v2 0.44 768
intfloat/e5-small-v2 0.13 384

FAQ

1. Do I need to add the prefix “query: “ and “passage: “ to input texts?

Yes, this is how the model is trained, otherwise you will see a performance degradation.

Here are some rules of thumb: