Tired of Troubleshooting Idle Search Resources? Use OpenSearch Benchmark for Performance Tuning | by Noam Schwartz | Nov, 2022

Learn how to set up OpenSearch benchmarks, create “workloads” and benchmark them between two computing devices

photo by ben white Feather unsplash

OpenSearch users often want to know how their search will perform in different environments, host types, and cluster configurations. Open Search BenchmarkRally, a community-driven, open-source fork, is the ideal tool for that purpose.

OpenSearch-Benchmark helps you reduce infrastructure costs by optimizing OpenSearch resource usage. This tool enables you to discover performance regressions and improve performance by periodically running benchmarks. Before benchmarking, you should try several other steps to improve performance – a topic I discussed earlier article,

In this article, I’ll walk you through setting up OpenSearch benchmarks and running search performance benchmarking by a new computing accelerator – the Associative Processing Unit (APU) – comparing widely used EC2 instances. searchium.ai,

We will be using a m5.4xlarge (us-west-1) EC2 machine on which I installed OpenSearch and indexed a 9.1 M-sized vector index called laion_text. index is a subset of large Lion dataset where I converted the text field to vector representation (using a . ) clip pattern):

pip3, git 1.9+, and an . Install Python 3.8+ including JDK suitable for running OpenSearch. Make sure JAVA_HOME points to that JDK. Then run the following command:

sudo python3.8 -m pip install opensearch-benchmark

tip: You may need to install each dependency manually.

  • sudo apt install python3.8-dev
  • sudo apt install python3.8-distutils
  • python3.8 -m pip install multidict –upgrade
  • python3.8 -m pip install attrs — upgrade
  • python3.8 -m pip install yarl –upgrade
  • python3.8 -m pip install async_timeout –upgrade
  • python3.8 -m pip install charset_normalizer –upgrade
  • python3.8 -m pip install aiosignal — upgrade

To verify that the installation was successful, run the following:

opensearch-benchmark list workloads

You should see the following details:

screenshot by author

By default, OpenSearch Benchmark reports “in-memory”. If set to “in-memory”, all metrics will be kept in memory when running the benchmark. If set to “OpenSearch”, all metrics will be written to the permanent metric store, and the data will be available for further analysis.

To save the reported results to your OpenSearch cluster, open opensearch-benchmark.ini file, which can be found in ~/.benchmark Modify the results publication section in the highlighted area to write to the folder and then to the OpenSearch cluster:

screenshot by author
photo by Scott Blake Feather unsplash

Now that we have the OpenSearch benchmark properly set up, it’s time to start benchmarking!

There are plans to use the OpenSearch benchmark to compare searches between the two computing devices. You can use the following method to benchmark and compare any examples you want. In this example, we will test the commonly used KNN flat search (an ANN example using IVF and HNSW will be covered in my next article) and compare a m5.4xlarge EC2 instance to APU.

You can access the APU through a plugin downloaded from Searchium.ai’s SaaS platform. You can test the following benchmarking process on your environment and data. A free trial is available, and registration is simple.

Each test/track is called a “workload” in the OpenSearch benchmark. We will create a workload for the search on m5.4xlarge, which will serve as our baseline. We will also create a task to search the APU, which will act as our contender. Later, we will compare the performance of both the workloads.

Let’s start by creating a workload for both m5.4xlarge (CPU) and APU laion_text index (make sure you run these commands from inside .benchmark directory):

opensearch-benchmark create-workload --workload=laion_text_cpu --target-hosts=localhost:9200 --indices="laion_text”
opensearch-benchmark create-workload --workload=laion_text_apu --target-hosts=localhost:9200 --indices="laion_text”

Comment: If workload a. saved in workloads folder in your home folders, you will need to copy them .benchmark/benchmarks/workloads/default directory.

run opensearch-benchmark list again workload and note that both laion_text_cpu And laion_text_apu are listed.

Next, we will add operations to the test schedule. You can add as many benchmarking tests as you want to this section. Add each test to the schedule workload.json file, which can be found in the folder named the index you want to benchmark.

In our case, it can be found in the following areas:

  • ./benchmark/benchmarks/workloads/default/laion_text_apu
  • ./benchmark/benchmarks/workloads/default/laion_text_cpu

We want to test our OpenSearch search. Create an operation called “single vector search” (or any other name) and include a query vector. I truncated the vector itself because a 512 dimension vector would be a bit long… add in the desired query vector and be sure to copy the same vector to the m5.4xlarge (CPU) and APU. workload.json Files!

Next, add the parameters you want. In this example, I’ll stick with the default eight clients and 1,000 iterations.

m5.4xlarge (CPU) workload.json,

"query_value":[INSERT VECTOR HERE],


apu workload.json,


photo by Tim Gouve Feather unsplash

It’s time to run our workload! We are interested in running our search workloads on a running OpenSearch cluster. I added some parameters execute_test command:

Distribution-version – Be sure to add your correct OpenSearch version.

Workload – The name of our assignment.

Other parameters are available. I added pipeline, client-optionsAnd on-errorWhich simplifies the whole process.

Go ahead and run the following command, which will run our workload:

opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_apu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"
opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_cpu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"

And now we wait…

Our results should look like the following:

laion_text_apu (apu) result
laion_text_cpu (m5.4xlarge) result

We are finally ready to see our test results. Drumroll, please…

First, we noticed that the running time of each workload was different. The M5.4xlarge (CPU) workload took 6.45 hours, while the APU workload took 2.78 minutes (139x faster). This is because the APU supports query aggregation, allowing for greater throughput.

Now, we want a more comprehensive comparison between our workloads. OpenSearch Benchmark enables us to generate a csv file where we can easily compare between workloads.

First, we’ll need to find the workload ID for each case. This can be done either by looking at the OpenSearch benchmark-test-performance index (which was created in step 2) or benchmarks folder:

Using the workload ID, run the following command to compare the two workloads and display the output in a CSV file:

opensearch-benchmark compare --results-format=csv --show-in-results=search --results-file=data.csv --baseline=ecb4af7a-d53c-4ac3-9985-b5de45daea0d --contender=b714b13a-af8e-4103-a4c6-558242b8fe6a

Here is a brief summary of our results:

image by author

Brief description of the results in the table:

  1. Throughput: The number of operations OpenSearch can perform within a given period of time, usually per second.

2. Latency: The time between submitting a request and receiving a complete response. It also includes waiting time, that is, the time spent waiting until the request is ready to be serviced by OpenSearch.

3. Service Time: The time between sending the request and receiving the corresponding response. This metric can easily be confused with latency but does not include wait times. Most load testing equipment is incorrectly referred to as “latency”.

4. Test Execution Time: The total runtime of the workload from start to finish.

Looking at our results, we can see that the service time for the APU workload was 127 times faster than for the m5.4xLarge workload. From a cost perspective, running the same workload on the APU costs $0.23, as opposed to $5.78 on the m5.4xlarge (25x less expensive), and we got our search results about 6.45 hours ago.

Now, imagine the magnitude of these benefits when scaling up to large datasets, which are likely to be in our data-driven, fast-paced world.

I hope this helped you understand more about the power of OpenSearch’s benchmarking tool and how you can use it to benchmark your search performance.

For more information about Searchium.ai’s plugins and APUs, please visit www.searchium.ai. They even offer a free trial!

Dmitry Sosnovsky and . many thanks to Yaniv Vaknin For all their help!

Leave a Reply