Header Ads

Solr searching large number of IDs or terms quickly

We came across a use case where we need to search a large number of documents by IDs in solr.

Solr provides many query parsers and using correct query may optimize query performance significantly.

For our use case we wanted exact match of document ids and our list of ids are in range of 100k to 1000k ids.

We explored several approaches to achieve this functionality. Boolean queries, filter queries,terms queries.

1. Boolean Queries:

First, we searched on the id field using standard query with OR operators.

Advantages:

Can search for a document using wildcards, range of ids, partial matches etc. 

Disadvantages:

Slow

For a large number of search terms, this is a very CPU and memory-intensive task. Every ID will go as a separate Boolean Clause and fetch a document set, these document sets are then combined using boolean Operators to get the final result set. all 100k ids will fetch document sets and they will be combined.

syntax 

id: (1 OR 2 OR 3 OR 4...)

If the default operator is OR, we can write it as below.

id:(1 2 3 4 5 ... )

or it can be simply used as

df=id&q=1 2 3 4 5 6 7 8 9 ...


2. terms query

Terms query uses a different approach, it populates a term filter of given terms and apply that filter once to get list of documents matching with terms filter.

Advantage:

Very fast

Disadvantages:

Only exact terms match

No scoring. returns a constant score for every document.

syntax:

{!terms f=fieldname separator = ","} term1,term2,term3


We reduced our query time from 5-6  min range to 5-6  seconds range with much lower load on cpu and memory. 



No comments

Powered by Blogger.