Elasticsearch: Introducing retrievers - Searching for everything

Elasticsearch: Introducing retrievers - Searching everything

2024-07-12

Author: From ElasticJeff Vestal, Jack Conradson

In 8.14, Elastic introduced a new search capability in Elasticsearch called “retrievers.” Read on to learn about their simplicity and efficiency, and how they can supercharge your search operations.

Retrievers are a new abstraction layer added to the search API in Elasticsearch. They provide the convenience of configuring multi-stage retrieval pipelines in a single _search API call. This architecture simplifies the search logic in your application by eliminating the need for multiple Elasticsearch API calls for complex search queries. It also reduces the need for client-side logic, which is often required to combine results from multiple queries.

Initial type of retriever

The initial release includes three types of retrievers. Each retriever is designed for a specific purpose, and when combined, they can perform complex search operations.

The available types are:

standard- Returns the top documents in traditional queries. These types are backwards compatible by supporting the existing query DSL request syntax, allowing you to migrate to the retriever framework at your own pace.
kNN - Returns the top documents in a kNN search.
RRF - Combines and ranks multiple first-stage retrievers into a single result set using a reciprocal fusion algorithm with no or minimal user tuning. The RRF retriever is a composite retriever whose filter elements are propagated to its child retrievers.

How are retrievers different and why are they useful?

With traditional queries, the query is part of the overall search API call. Retrievers are different in that they are designed as independent entities that can be used alone or easily combined. This modular approach provides greater flexibility when designing search strategies.

Retrievers are designed as part of a "retriever tree", a hierarchical structure that defines search operations by clarifying their order and logic. This structure makes complex searches more manageable, easier for developers to understand, and allows new features to be easily added in the future.

Retrievers support composability, allowing you to build pipelines and integrate different retrieval strategies. This allows for easy testing of different retrieval combinations. They also provide more control over how documents are scored and filtered. For example, you can specify a minimum score threshold, apply complex filters without affecting the score, and use parameters such as terminate_after for performance optimization.

Maintains backward compatibility with legacy query elements by automatically converting them to appropriate retrievers.

Retriever usage examples

Let's look at some examples of using the retriever. We use the IMDB sample dataset.

You can run the included Jupyter Notebook, import the IMDB data into the serverless search project, and run the following examples yourself!

The high level settings are:

overview - a short summary of the movie
names - the names of the movies
overview_dense - dense_vector generated from the e5-small model
overview_sparse - Sparse vectors using Elastic's ELSER model.
Using only fields and setting _source:false returns a text version of names and overview

Standard - Search all text!


GET /imdb_movies/_search?pretty
{
  "retriever": {
    "standard": {
      "query": {
        "term": {
          "overview": "clueless"
        }
      }
    }
  },
  "size": 3,
  "fields": [
    "names",
    "overview"
  ],
  "_source": false
}

kNN - Search all dense vectors!


GET /imdb_movies/_search?pretty
{
  "retriever": {
    "knn": {
      "field": "overview_dense",
      "query_vector_builder": {
        "text_embedding": {
          "model_id": ".multilingual-e5-small_linux-x86_64",
          "model_text": "clueless slackers"
        }
      },
      "k": 5,
      "num_candidates": 5
    }
  },
  "size": 3,
  "fields": [
    "names",
    "overview"
  ],
  "_source": false
}

text_expansion - Search all sparse vectors!


GET /imdb_movies/_search?pretty
{
  "retriever": {
    "standard": {
      "query": {
        "text_expansion": {
          "overview_sparse": {
            "model_id": ".elser_model_2_linux-x86_64",
            "model_text": "clueless slackers"
          }
        }
      }
    }
  },
  "size": 3,
  "fields": [
    "names",
    "overview"
  ],
  "_source": false
}

rrf - combines everything!


GET /imdb_movies/_search?pretty
{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "term": {
                "overview": "clueless slackers"
              }
            }
          }
        },
        {
          "knn": {
            "field": "overview_dense",
            "query_vector_builder": {
              "text_embedding": {
                "model_id": ".multilingual-e5-small_linux-x86_64",
                "model_text": "clueless slackers"
              }
            },
            "k": 5,
            "num_candidates": 5
          }
        },
        {
          "standard": {
            "query": {
              "text_expansion": {
                "overview_sparse": {
                  "model_id": ".elser_model_2_linux-x86_64",
                  "model_text": "clueless slackers"
                }
              }
            }
          }
        }
      ],
      "rank_window_size": 5,
      "rank_constant": 1
    }
  },
  "size": 3,
  "fields": [
    "names",
    "overview"
  ],
  "_source": false
}

Current limitations of the retriever

Searchers come with certain restrictions that users should be aware of. For example, only query elements are allowed when using compound searchers. This enforces a cleaner separation of concerns and prevents the complexity that comes with overly nested or independent configurations. Additionally, child searchers must not use elements that restrict compound searchers from being part of the searcher tree.

These restrictions improve performance and composability even when using complex retrieval strategies.

The retriever is initially released as a technology preview, so its API is subject to change.

in conclusion

Retrievers represent a major step forward in Elasticsearch's search capabilities and user-friendliness. They can be chained together in a pipeline fashion, with each retriever applying its logic and passing the results to the next item in the chain. Retrievers can significantly enhance the search experience by allowing more structured, flexible, and efficient search operations.

The following resources provide more detailed information about retrievers.

Try the above code yourself! You can runAttached jupyter notebook, import IMDB data into the Elastic Serverless Search project!

Ready to try it yourself? Get StartedFree Trial。
Want to get Elastic certified? Find out about the nextElasticsearch Engineer TrainingWhen does it start!

original:Elasticsearch retrievers - How to use search retrievers in Elasticsearch — Elastic Search Labs

Technology Sharing