Splunk Query Optimization – 20 Tips for Writing Better SPL PLUS Additional Resources

Jon Papp
March 6, 2018

On Wednesday, February 28^th, SP6 presented a webinar titled “How to Maximize Splunk Search Queries.” This webinar presented a number of ways to drive Splunk query optimization to improve response time and drive faster insight.

The presentation included an overview of the basics of Splunk query optimization, examples of SPL best practices, and techniques for searching really big data.

Writing Better SPL
During the presentation, we detailed 7 Splunk Search Processing Language (SPL) best practices for faster search. As a rule of thumb, it’s best to be as specific as possible when writing queries. After that, keep the practices below in mind:

Filter data as early and as much as possible. This means cutting things down right away instead of cutting them down later.
Avoid wildcards
Use macros and subsearches instead of wildcards for list filtering
Avoid using “NOT” – because the way Splunk implements NOT is NOT the way you might expect
Avoid tags and eventtypes when writing an optimized search
Run searches in FAST mode when using the Search bar to do interactive searching
Position streaming commands, especially distributable streaming commands, before transforming commands.

Key Questions on Splunk Query Optimization
After the webinar, we set aside time for a question-and-answer session. Below is a review of the key questions and answers.

Q: Can you explain the difference between a Summary Index and Data Model Acceleration? Why use one over the other?

With Summary Indexing, the key difference is the nuts and bolts of how it works. You’re running a scheduled search that you have defined, and the results of that search are being saved in a Splunk index just like any indexed Splunk data. Summary Indexing was used years ago before Data Model Acceleration was released, but if Splunk was down or needed to be restarted during the time the scheduled search was supposed to run, there would be gaps in the data, because the search results didn’t populate for that time frame. It was difficult to manage Summary Indexing to be sure that results were up-to-date, accurate, and complete.

Data Model Acceleration solved the problem by automating all of this for you. If Splunk misses one of its search intervals, it will catch up the next time the search is run. This ensures that the data is complete over time. Data Model acceleration is an evolution of Summary Indexing. In most cases, like if you’re using a premium product like ES or ITSI, it’s the preferred way to summarize your data.

Q: Is it better to search for something like 127.0.0.1 THEN in a later step, src_ip=127.0.0.1 (to further restrict it)?

There’s a bit of nuance here. On one hand, the raw search power of Splunk is going to be faster, to just search on src_ip=127.0.0.1 in your initial search. But if you limit it later to src_ip=127.0.0.1, then that regex still has to be run. It can depend on the specifics of your search string, but in general, this is the best approach. It may not make much of a difference in terms of search performance, but it may make a difference in terms of better limiting the search.

Q: What is the preferred approach for dealing with nested JSON objects? Mvexpand works well for a short time range but uses a lot of memory.

Nested JSON data can be tricky. When you have a JSON event in Splunk, there can be arrays contained in the event, and you’ll end up with multi-valued fields. You might use something like mvexpand to expand those events, but that can be a huge memory eater. For many data sources, a good first step is to try to break the arrays on ingestion. When I ingest that type of data, I might write a response handler in Python and break those events out when possible. If that doesn’t work there are other techniques like spath. But depending on how big your JSON is, you’re probably going to end up with memory issues unless you can find a way to divide that event. It depends on the specifics of your situation.

Additional Resources
Splunk query optimization is a large topic and there are many different areas to explore. It’s important to familiarize yourself with Splunk documentation and Splunk reference pages on this topic. These are listed below, along with additional information on query optimization:

Splunk Query Optimization – One Size Does Not Fit All
When it comes to Splunk query optimization, one size (or one best practice) does not necessarily fit every situation. Organizations implement Splunk in different manners, and what works well for one deployment may not be the best fit for another. That’s where SP6 can add value to your organization’s Splunk deployment.

SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration, and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.

Splunk Query Optimization – 20 Tips for Writing Better SPL PLUS Additional Resources

Jon Papp

Splunk Services

Cyber Compliance

Company