databricks-ml-examples

Example notebooks for BGE Model family on Databricks

Model List

bge is short for BAAI general embedding.

Model Language query instruction for retrieval*
BAAI/bge-large-en-v1.5 English Represent this sentence for searching relevant passages:
BAAI/bge-base-en-v1.5 English Represent this sentence for searching relevant passages:
BAAI/bge-small-en-v1.5 English Represent this sentence for searching relevant passages:
BAAI/bge-large-zh-v1.5 Chinese 为这个句子生成表示以用于检索相关文章:
BAAI/bge-base-zh-v1.5 Chinese 为这个句子生成表示以用于检索相关文章:
BAAI/bge-small-zh-v1.5 Chinese 为这个句子生成表示以用于检索相关文章:

*: If you need to search the long relevant passages to a short query (s2p retrieval task), you need to add the instruction to the query; in other cases, no instruction is needed, just use the original query directly. In all cases, no instruction need to be added to passages.

Example notebooks

This folder contains the following examples for BGE models:

File Description GPU Minimum Requirement
01_load_inference Environment setup and suggested configurations when inferencing BGE models on Databricks. 1xT4
02_mlflow_logging_inference Save, register, and load BGE models with MLFlow, and create a Databricks model serving endpoint. 1xT4
03_build_document_index Build a vector store with faiss using BGE models. 1xT4
04_fine_tune_embedding Fine-tune BGE models. 1xT4