Integration: FastEmbed

Use the FastEmbed embedding models

Authors
Nicola Procopio

Table of Contents

Overview

FastEmbed is a lightweight, fast, Python library built for embedding generation.

  1. Light & Fast
  • Quantized model weights
  • ONNX Runtime for inference via Optimum
  1. Accuracy/Recall

Installation

pip install fastembed-haystack

Usage

Components

You can use Fastembed models with two components: FastembedTextEmbedder and FastembedDocumentEmbedder.

To create semantic embeddings for documents, use FastembedDocumentEmbedder in your indexing pipeline. For generating embeddings for queries, use FastembedTextEmbedder.

Example

Below is the example indexing pipeline with InMemoryDocumentStore, InMemoryEmbeddingRetriever, FastembedTextEmbedder and FastembedDocumentEmbedder:

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder, FastembedTextEmbedder

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="fastembed is supported by and maintained by Qdrant."),
]

document_embedder = FastembedDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", FastembedTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who supports fastembed?"

result = query_pipeline.run({"text_embedder": {"text": query}})

License

fastembed-haystack is distributed under the terms of the Apache-2.0 license.