Editor’s Note: This article is co-authored by Barry Duncan, Team Lead, Professional Services Observability; and Chase Yates, Practice Manager, Security.
A common problem technology teams face is the high volume of noise in their environments and the resulting alert fatigue.
Items clamoring for an analyst’s attention can include:
- Threats
- Anomalies
- Incidents
- Notable Events
According to Splunk, over 40% of organizational Security Operations Centers (SOCs) experience more than 10,000 alerts each day. Of those, over half are false positives.
Many companies lack sufficient technology staff. As a result, overworked and overwhelmed analysts may be slow to detect and respond to data threats. Often, analysts suffer from job burnout and quit. It can become a vicious cycle.
Fortunately, it’s possible to solve the twin problems of noise and alert fatigue. Members of the SP6 service team have identified some steps you can take to work more effectively. You’ll also improve your security coverage in the process.
“In today’s world of security monitoring, with so much emphasis placed on alert tuning, there is a significant risk in trying to tune out as much noise as possible. Legitimately bad things may be excluded from your result set in the name of noise reduction,” said Chase Yates, Practice Manager, Security at SP6.
This is where Risk-Based Alerting (RBA) comes in. “We use it to make sure our monitoring teams are focusing on the right alerts while reducing the risk of tuning bad things out,” Yates said. “Not only will RBA reduce noise, true positive rates significantly increase, and the riskiest events bubble up to the top, ensuring your team is focusing on the right alerts first.”
Advantages of Risk-Based Alerting – Security
The following security scenario takes place in Splunk Enterprise Security, where RBA allows technology teams to capably manage alerts.
Here’s how it works. Splunk analysts create entity attributions and send them to the Risk Index in Enterprise Security. Note that you are not directly generating alerts, but forwarding them to the Risk Index. (You can also use behavioral patterns to set thresholds.)
As you create attributions, apply the appropriate context to them. Here’s an example:
- Annotate an attribution with a relevant Mitre ATT&CK tactic or Kill Chain stage.
- Or, apply a custom annotation of your choice.
- This turns the Risk Index into a collection of risky behavior you can mine.
While each observation you send to the Risk Index may not indicate a threat on its own, it can indicate a threat when put in the context of the risky attributions/annotations.
Using a Risk Score to Create an Alert
If you use the risk score method, you can generate an alert when a user or system within your environment surpasses 100 within 24 hours. Another option is to create a rule that says something like, “Generate an alert for each risk object with activity spanning three or more Mitre ATT&CK tactics over the past 14 days.”
By using this approach, you embed the value of cybersecurity frameworks in your detections and take them from concept to operational reality.
Additionally, you can scan for outliers in particular business units or active directory roles, and generate alerts when a user’s risk score is one or two standard deviations above the normal for that unit or role.
As you can see in the two graphics above, the SOC created a rule where an alert is triggered any time a user exceeds an aggregated risk score over 100 in a 24-hour period.
By clicking into the alert, the analyst can view each attribution contributing to that alert in a Risk Message. This provides them with instant context. They’ll see that several lower-level alerts they could have overlooked or suppressed are all attributable to the same user – and pose a potential threat.
Benefits of Risk-Based Alerting – Security
Once you initiate an RBA process within Splunk, your SOC team should notice a significant reduction in alert volumes – 60% to 80% in many cases. In addition, some customers have lowered their abandoned percentage to nearly zero.
What’s more, detection quality goes up. One customer reported their true positive rate doubled in the space of a few months. Others say they’ve been able to identify scenarios such as slow, prolonged attacks they would have had trouble detecting if they used traditional correlation searches.
Yet another benefit of RBA is that it will help you improve your security maturity and pinpoint areas you can improve upon. It’s possible by taking the Mitre ATT&CK matrix, NIST, CIS 20, or any framework you choose. Annotate your searches with the relevant tactics, controls, or other framework components.
Finally, after initiating RBA, customers say they’ve reduced their operational expenses. How? By adding more data sources and detections to their analytics. This, in turn, generates more alerts – and a need for more analysts to review them. Only now, they’re working more efficiently.
Wouldn’t you prefer that your SOC team focus on genuine security issues? They certainly would!
Advantages of Episodes – IT Service Intelligence
Our next example of how to control noise and get a grip on alert fatigue also comes from Splunk. This time, it’s on the ITSI and observability side. Here, thresholds must be tuned properly. However, service owners aren’t always sure when to alert.
Within ITSI, the Notable Event Aggregation Policy, or NEAP, can help reduce noise. As Barry Duncan, a Professional Services Team Lead, Observability, for SP6 notes, NEAP uses a correlation search to decide if it should sound an alert in a given situation and group alerts into logical Episodes.
Splunk correlation searches can be customized and enhanced. It’s a solution that can be built to scale, is highly performant, and is maintainable.
This is the five-step process:
- Create initial notables.
- Group related notables.
- Create additional notables.
- Add alerting.
- Throttle alerts.
Benefits of Episodes – IT Service Intelligence
Again, being understaffed in the technology department is part of the challenge. Let’s say your company has seven web servers and they all crash at once. In most monitoring environments, alerts will pop up for all seven – but you won’t know if they’re related.
With ITSI and NEAP, these can be grouped into one episode or alert. (An SP6 Splunk expert could set that up for you.) The correlation search would run every minute, looking for whichever criteria you set up. It works on a scale of 1 to 6, with 1 being information and 6 being critical. Furthermore, it can be configured to prevent over 100 different alerts.
Duncan remembers an SP6 client for whom we were sending alerts via a cloud-based IT service management (ITSM) software platform for the same problem because suppression wasn’t properly established. An engineer went in and focused the search. (Note: Unlike Security, ITSI doesn’t deal with a lot of false positives. If the KPIs are written correctly, they drive correlation searches.)
Noise and Alert Fatigue: In Conclusion
Dealing with excessive environmental noise and struggling with alert fatigue on a regular basis will eventually wear down a security or observability team. This, in turn, can expose your organization to actual threats.
Now is the time to act. Your first step should be to partner with a company experienced in tackling these issues. The experts at SP6 have the knowledge and experience to identify and solve any problems you may have. Start by contacting us today for a no-cost, no-obligation consultation.