Learn how to set up OpenSearch benchmarks, create “workloads” and benchmark them between two computing devices
OpenSearch users often want to know how their search will perform in different environments, host types, and cluster configurations. Open Search BenchmarkRally, a community-driven, open-source fork, is the ideal tool for that purpose.
OpenSearch-Benchmark helps you reduce infrastructure costs by optimizing OpenSearch resource usage. This tool enables you to discover performance regressions and improve performance by periodically running benchmarks. Before benchmarking, you should try several other steps to improve performance – a topic I discussed earlier article,
In this article, I’ll walk you through setting up OpenSearch benchmarks and running search performance benchmarking by a new computing accelerator – the Associative Processing Unit (APU) – comparing widely used EC2 instances. searchium.ai,
We will be using a m5.4xlarge (us-west-1) EC2 machine on which I installed OpenSearch and indexed a 9.1 M-sized vector index called laion_text. index is a subset of large Lion dataset where I converted the text field to vector representation (using a . ) clip pattern):
pip3, git 1.9+, and an . Install Python 3.8+ including JDK suitable for running OpenSearch. Make sure JAVA_HOME points to that JDK. Then run the following command:
sudo python3.8 -m pip install opensearch-benchmark
tip: You may need to install each dependency manually.
sudo apt install python3.8-dev
sudo apt install python3.8-distutils
python3.8 -m pip install multidict –upgrade
python3.8 -m pip install attrs — upgrade
python3.8 -m pip install yarl –upgrade
python3.8 -m pip install async_timeout –upgrade
python3.8 -m pip install charset_normalizer –upgrade
python3.8 -m pip install aiosignal — upgrade
To verify that the installation was successful, run the following:
opensearch-benchmark list workloads
You should see the following details:
By default, OpenSearch Benchmark reports “in-memory”. If set to “in-memory”, all metrics will be kept in memory when running the benchmark. If set to “OpenSearch”, all metrics will be written to the permanent metric store, and the data will be available for further analysis.
To save the reported results to your OpenSearch cluster, open opensearch-benchmark.ini
file, which can be found in ~/.benchmark
Modify the results publication section in the highlighted area to write to the folder and then to the OpenSearch cluster:
Now that we have the OpenSearch benchmark properly set up, it’s time to start benchmarking!
There are plans to use the OpenSearch benchmark to compare searches between the two computing devices. You can use the following method to benchmark and compare any examples you want. In this example, we will test the commonly used KNN flat search (an ANN example using IVF and HNSW will be covered in my next article) and compare a m5.4xlarge EC2 instance to APU.
You can access the APU through a plugin downloaded from Searchium.ai’s SaaS platform. You can test the following benchmarking process on your environment and data. A free trial is available, and registration is simple.
Each test/track is called a “workload” in the OpenSearch benchmark. We will create a workload for the search on m5.4xlarge, which will serve as our baseline. We will also create a task to search the APU, which will act as our contender. Later, we will compare the performance of both the workloads.
Let’s start by creating a workload for both m5.4xlarge (CPU) and APU laion_text
index (make sure you run these commands from inside .benchmark
directory):
opensearch-benchmark create-workload --workload=laion_text_cpu --target-hosts=localhost:9200 --indices="laion_text”
opensearch-benchmark create-workload --workload=laion_text_apu --target-hosts=localhost:9200 --indices="laion_text”
Comment: If workload a. saved in workloads
folder in your home
folders, you will need to copy them .benchmark/benchmarks/workloads/default
directory.
run opensearch-benchmark list
again workload and note that both laion_text_cpu
And laion_text_apu
are listed.
Next, we will add operations to the test schedule. You can add as many benchmarking tests as you want to this section. Add each test to the schedule workload.json
file, which can be found in the folder named the index you want to benchmark.
In our case, it can be found in the following areas:
./benchmark/benchmarks/workloads/default/laion_text_apu
./benchmark/benchmarks/workloads/default/laion_text_cpu
We want to test our OpenSearch search. Create an operation called “single vector search” (or any other name) and include a query vector. I truncated the vector itself because a 512 dimension vector would be a bit long… add in the desired query vector and be sure to copy the same vector to the m5.4xlarge (CPU) and APU. workload.json
Files!
Next, add the parameters you want. In this example, I’ll stick with the default eight clients and 1,000 iterations.
m5.4xlarge (CPU) workload.json
,
"schedule":[
{
"operation":{
"name":"single-vector-search",
"operation-type":"search",
"body":{
"size":"10",
"query":
"script_score":
"query":
"match_all":
,
"script":
"source":"knn_score",
"lang":"knn",
"params":
"field":"vector",
"query_value":[INSERT VECTOR HERE],
"space_type":"cosinesimil"
}
},
"clients":8,
"warmup-iterations":1000,
"iterations":1000,
"target-throughput":100
}
]
apu workload.json
,
"schedule":[
{
"operation":
"name":"single-vector-search",
"operation-type":"search",
"body":
"size":"10",
"query":
"gsi_knn":
"field":"vector",
"vector":[INSERT VECTOR HERE],
"topk":"10"
,
"clients":8,
"warmup-iterations":1000,
"iterations":1000,
"target-throughput":100
}
]
It’s time to run our workload! We are interested in running our search workloads on a running OpenSearch cluster. I added some parameters execute_test
command:
Distribution-version
– Be sure to add your correct OpenSearch version.
Workload
– The name of our assignment.
Other parameters are available. I added pipeline
, client-options
And on-error
Which simplifies the whole process.
Go ahead and run the following command, which will run our workload:
opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_apu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"
opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_cpu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"
And now we wait…
Our results should look like the following:
We are finally ready to see our test results. Drumroll, please…
First, we noticed that the running time of each workload was different. The M5.4xlarge (CPU) workload took 6.45 hours, while the APU workload took 2.78 minutes (139x faster). This is because the APU supports query aggregation, allowing for greater throughput.
Now, we want a more comprehensive comparison between our workloads. OpenSearch Benchmark enables us to generate a csv file where we can easily compare between workloads.
First, we’ll need to find the workload ID for each case. This can be done either by looking at the OpenSearch benchmark-test-performance index (which was created in step 2) or benchmarks
folder:
Using the workload ID, run the following command to compare the two workloads and display the output in a CSV file:
opensearch-benchmark compare --results-format=csv --show-in-results=search --results-file=data.csv --baseline=ecb4af7a-d53c-4ac3-9985-b5de45daea0d --contender=b714b13a-af8e-4103-a4c6-558242b8fe6a
Here is a brief summary of our results:
Brief description of the results in the table:
- Throughput: The number of operations OpenSearch can perform within a given period of time, usually per second.
2. Latency: The time between submitting a request and receiving a complete response. It also includes waiting time, that is, the time spent waiting until the request is ready to be serviced by OpenSearch.
3. Service Time: The time between sending the request and receiving the corresponding response. This metric can easily be confused with latency but does not include wait times. Most load testing equipment is incorrectly referred to as “latency”.
4. Test Execution Time: The total runtime of the workload from start to finish.
Looking at our results, we can see that the service time for the APU workload was 127 times faster than for the m5.4xLarge workload. From a cost perspective, running the same workload on the APU costs $0.23, as opposed to $5.78 on the m5.4xlarge (25x less expensive), and we got our search results about 6.45 hours ago.
Now, imagine the magnitude of these benefits when scaling up to large datasets, which are likely to be in our data-driven, fast-paced world.
I hope this helped you understand more about the power of OpenSearch’s benchmarking tool and how you can use it to benchmark your search performance.
For more information about Searchium.ai’s plugins and APUs, please visit www.searchium.ai. They even offer a free trial!
Dmitry Sosnovsky and . many thanks to Yaniv Vaknin For all their help!