A Practical Guide to Benchmarking Search Systems

A Practical Guide to Benchmarking Search Systems

In my early career days, I overheard a senior engineer saying that "we should deploy these systems well below the knee point of the hockey stick," and I didn't understand what he meant. Sure, I had been benchmarking before, even thought I was decent at it. My approach was simple: throw a benchmarking client at the system, benchmark with 100 clients, and get two numbers: QPS and latency. Then report and move on. If the QPS was too low or the latency too high, we'd optimize or throw hardware at it. It took years of experience (and guidance) to understand how to properly benchmark search systems.

Here's a practical guide to benchmarking search engines, without too much ceremony. This is a starter guide, but I repeatedly find myself explaining these basics so why not write an article about it.

Getting Started

Once you've built your search system with the required functionality and ranking strategy in place, you'll want to benchmark it to understand performance and hardware requirements. First, find a representative set of queries – understand how many terms users typically use when searching. Then, use an HTTP benchmarking client to generate load. During this process, monitor your servers, focusing on:

- CPU utilization

- Disk I/O

- Network I/O

The Single-Client Baseline

Start with no concurrency – a single user with the entire system available. If latency is already too high for your SLA, you'll need to cut features or add resources. This is also your opportunity to systematically test different features and queries to understand their performance impact. Here you can also compare queries, maybe longer queries that retrieves more documents are slower. Here it can also be useful to plot latency per query versus dimensions like the total number of hits, query length or other dimensions.

At this stage, examine utilization metrics. If any resource is already at 100%, you'll face challenges pushing higher throughput.

Increasing Concurrency

Next, simulate multiple users by increasing concurrency. Track latency and throughput as you add load. Continue until throughput plateaus and latency climbs sharply – you've found the knee of the hockey stick.

This is what that senior engineer meant: know your system's concurrency and throughput limits. Deploy well below the knee, because a small increase in concurrency beyond it can make your system unresponsive.

Identify your bottleneck at this point. Is it I/O, CPU, memory, or network? If none show high utilization, look for software bottlenecks like insufficient threads or synchronization issues. Understanding these constraints helps you plan scaling strategies – whether to add servers, change instance types, or optimize code.

While you could push thousands of concurrent clients to find maximum throughput, the resulting latency numbers would be meaningless. They're dominated by queue time. This was my biggest mistake in those early days, before understanding how to benchmark search systems.

How do we bring in config changes (that can rarerly make or break scaling) into benchmarking? Or while benchmarking it's not a good idea to use any other config than the default out of box stuff?

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics