Embedding Registry
OSS In LanceDB OSS, you can get a supported embedding function from the registry, and then use it in your table schema. Once configured, the embedding function will automatically generate embeddings when you insert data into the table. And when you query the table, you can provide a query string or other input, and the embedding function will generate an embedding for it.Using an embedding function
The.create() method accepts several arguments to configure the embedding function’s behavior. max_retries is a special argument that applies to all providers.
| Argument | Type | Description |
|---|---|---|
name | str | The name of the model to use (e.g., text-embedding-3-small). |
max_retries | int | The maximum number of times to retry on a failed API request. Defaults to 7. |
| Argument | Type | Description |
|---|---|---|
batch_size | int | The number of inputs to process in a single batch. Provider-specific. |
api_key | str | The API key for the embedding provider. Can also be set via environment variables. |
device | str | The device to run the model on (e.g., “cpu”, “cuda”). Defaults to automatic detection. |
Embedding model providers
LanceDB supports most popular embedding providers.Text embeddings
| Provider | Model ID | Default Model |
|---|---|---|
| OpenAI | openai | text-embedding-ada-002 |
| Sentence Transformers | sentence-transformers | all-MiniLM-L6-v2 |
| Hugging Face | huggingface | colbert-ir/colbertv2.0 |
| Cohere | cohere | embed-english-v3.0 |
| … | … | … |
Multimodal embedding
| Provider | Model ID | Supported Inputs |
|---|---|---|
| OpenCLIP | open-clip | Text, Images |
| ImageBind | imagebind | Text, Images, Audio, Video |
| … | … | … |
Embeddings in LanceDB Cloud and Enterprise
Currently, the embedding registry on LanceDB Cloud or Enterprise supports automatic generation of embeddings during data ingestion, generated on the client side (and stored on the remote table). We don’t yet support automatic query-time embedding generation when sending queries, though this is planned for a future release. For now, you can manually generate the embeddings at query time using the same embedding function that was used during ingestion, and pass the embeddings to the search function.Custom Embedding Functions
You can always implement your own embedding function by inheriting fromTextEmbeddingFunction
(for text) or EmbeddingFunction (for multimodal data).