In my last post, we talked about the Defenders’ Dilemma citing the challenges SOC teams face from our recently released its State of Threat Detection 2023. To us, the results were somewhat expected considering the common themes we’ve been hearing from customers, peers and security professionals over the past few years — mainly confirming what we knew to be true — that most current approaches to threat detection and response are broken and here’s why:
- SOC teams receive an average of 4,484 alerts per day, two-thirds (67%) of them are ignored, 83% are false positives.
- Nearly three-quarters (71%) of analysts admit the organization they work in may have been compromised and they don’t know about it yet.
- Most (97%) of analysts worry they’ll miss a relevant security event because it was buried in a flood of security alerts.
- Yet, 90% of SOC analysts say their detection and response tools are effective.
I don’t know about you, but why does the definition of effective come down to the ability to generate an alert? That appears to be the measuring stick 90% of SOC analysts are using despite believing they are already compromised and have genuine anxiety about missing a real attack. Perplexing questions — and to dig up some answers, Kevin Kennedy, SVP of Product at Vectra AI and Matt Bromiley, Lead Solutions Engineer at LimaCharlie and SANS Instructor took part in a recent webinar with SANS where they laid down the gauntlet: “It’s time security vendors are held accountable for the efficacy of their signal.”
We dove into the research and asked Kevin who represented the voice of the vendor, and Matt who represented the voice of the practitioner 3 simple questions:
- More attack surface, more alerts, more false positives – what is the problem to solve?
- More blind spots, more compromise, more turnover – where should we begin addressing the problem?
- More visibility, more detections, more signal – what makes a SOC analyst most effective at their job?
Below are some highlights from the conversation. You can watch the entire conversation with Kevin and Matt, here.
Q1: More attack surface, more alerts, more false positives – what is the problem to solve?
Matt: “I think a couple of things have happened from a tool and technology perspective first off that has opened this up. I think if we ask this question 4 or 5 years ago, the results would have been very different. Attack surface, I should say, an understanding of the attack surface, is something that has grown over the past few years. I think we've increased our knowledge about our environment. One of the problems is that (expanding attack surface) should not translate to just add more stuff to the alert queue and we'll just figure it out. I think what's happened here is that attack surface and alerts have become synonymous for some organizations and for some SOC analysts.
It's about how we classify them (alerts). I don't want my vulnerable applications to be reported in the exact same queue that reports on privilege escalation or the running of Mimikats or something like that. I would look at it almost as a security posture queue versus an actual adversary activity queue or something along those lines. And that classification might ease the tension on SOCs a little bit just because you're no longer having to think of a vulnerable facing web application as an emergency thing to go fix right now. And you know, when we hear something like, 4,500 alerts a day — those numbers are just staggering (and) this seems low compared to some of the SOCs that I'm seeing who are just like barely, barely holding their noses above water. Who owns the alerts is another part of the problem. Who actually is responsible for fixing that? Is security responsible for fixing it or is security responsible for following up on it? And a lot of times it's the latter, but we confuse it with the former.”
Kevin: “I think the problem is the approach. If you look at how this has evolved, you rewind 10 years, it was throw data at it, write rules and it was going to give the signal. Some organizations have made that work, most have not. And, so you saw this evolution of attack surface and a tool dedicated to each for (the purpose of) detection and response. There's EDR. There's NDR, there's identity. There's cloud detection and response. So, everything is a point solution and pretty much every tool optimizes for coverage and often low-level coverage. Not really thinking about noise and that's really where the incentives sit from a vendor standpoint. It's low-level coverage because of the way that the products get tested and evaluated. That's the drive and the reality. It's also a lot cheaper to develop low-level noisy coverage than it is actually getting after a single. And so, you just flood (analysts) with alerts, but you have coverage. There's going to be the false positive or the actual malicious activity if you can find it. If you flag every single remote code execution, you're going to get malicious. You're also gonna get every admin doing their job and you're gonna deliver it cheaply.
It's not actually solving the problem. It's contributing to the problem. And this carries through into things like testing. So, if you look at MITRE, we map all of our detections to those methods. We think it's a really useful library in language. If you look at the testing they do, mainly focused on EDRs, there is 0 consideration of false positives. It is only a malicious method. It's fired in the environment. They say did this detect or not?
There is no check of whether normal stuff is firing tens of thousands of alerts. And if that's the way that you're testing and evaluating, that's the way the incentives are set up, then you're gonna get products that support that. And then you say, okay security engineering, it's your problem to figure out how to make sense. We're burning people out with this approach. First of all, I think if we're throwing 4,500 alerts on average at analysts, there's not a lot of time to even use their creative skills when your bias is going to be — close this alert really freaking fast and get to the next one because I'm being measured on — do I get through the queue and how long did it did it take? That's absolutely the wrong place to be. It doesn't allow creativity to come in. The number one thing is creating space and time for analysts to use their creativity by giving them what is relevant to look at and the context and the data in order to make the right decision quickly and efficiently. If it is real, stop the attack. That's how we bring out the best in an analyst.”
Q2: More blind spots, more compromise, more turnover – where should we begin addressing the problem?
Kevin: I think turnover is the outcome, not the source of the problem. I think the way that we begin addressing this is with an integrated signal that is more accurate. It comes down to how to consolidate where you're getting your attack signal from. There's always a lot of talk about a single pane of glass to rule them all. That's a great ambition to plan towards, but it's highly unlikely you will get there in a single step with a lot of point products. So, think of how you consolidate over time and really hold us vendors accountable for the quality of the signal that we're providing. If you want to know who's adding value, run a red team that's based on your threat model and threat adversary and see if your tooling shows that it happened and that you knew while it was happening. If we're able to get to integrated, accurate signal, and you hold us accountable for that, it's going to go a long way towards addressing some of these core issues.”
Matt: “And because it focuses on giving the SOC the power to step back and say, hey, you know the problems that I'm seeing are stemming from one or two tool sets in the environment. I hope there's not an exit interview question for any organization which says, you know, which tool was the worst to deal with? Sit down and have that discussion with your team about getting the value that we need out of the stack that we have? And address that as a touch point, get the input from the SOC team. To be drowning in alerts every day and considering leaving, I view that some of these (pains) can also be dealt with from a management perspective. You don't want to lose anyone on the team because your environment has unique niches that only your team knows. What's our weaknesses as an org? Sit down and have that chat — it’s a great way to start to address these concerns.
Q3: More visibility, more detections, more signal – what makes a SOC analyst most effective at their job?
Matt: Along with the idea of burnout also comes that idea of my voice doesn't matter. Drowning SOC analysts is just the way it is. And it's funny, I was reading this thread, on, a discussion forum, I think yesterday or the day before, where someone was like, look — I'm just burnt out. I'm just done. And that's it. And it's funny because they followed up with, I'm thinking of changing jobs. And the first response was. Well, if you're burnt out, where are you going to go? When we look at effectiveness, where can I find the most value and efficiency? I want to echo Kevin’s sentiments here, which is If there is an area where your tools are not being effective, I mean, 44% came back and said, yeah, we think that our vendor could be held better, for you, 44%, guess what you're doing tomorrow? You're holding your vendor accountable for the signals that they're sending you and then you're using that as your movement forward.
If I wanted to rank SOC analyst effectiveness, I'm not really looking at the number of tickets closed. I shouldn't be. What I want to look at is yesterday this thing took us two hours to do, and today it took us an hour and 50 minutes. You just saved time. You saved us productivity. How did you do that?
If your goal is to just close out as many tickets as you can, then that's where your bias comes in. That's where your lack of attention to detail comes in and you're not utilizing all those human things you're good at — you're just closing out tickets just because that’s the incentive you're based on. It feels like the numbers are almost opposite of what we expected. I would have expected more dissatisfaction than we're seeing in (the report).
Kevin: I think the SOC analyst effectiveness, it goes back to giving them space and time to be creative — your voices need to be heard around tool effectiveness after the fact. But there's also when you're choosing (a tool). We work with a lot of SOC teams and security engineering is on the line about the criterion around how you choose the tooling. Look at how it works with other tools, so you can automate more. Look at the quality. Run a red team. Understand your threat model.
Takeaways
Over the course of this one-hour conversation, we walked away with the problem to solve is alerts, and not just alert noise — anyone can claim to reduce noise. The core problem is the approach to alerts. The use of alerts to measure. The ownership of alerts and defining what warrants an alert. In terms of the priorities to set, we talked about integrated signal. The ability to consolidate and measure signal effectiveness. Make sure you (SOC analysts) have a voice informing management and security engineering on the effectiveness of that signal because if it's an ineffective signal, you have to tell someone. We talked about how SOC leaders need to be tapping the shoulder of the SOC analyst asking for input. What can be done to make the job better?
Want to hear more from Kevin and Matt? Listen to their previous conversation, Breaking away from the spiral of “more” in security - Why the only “more” security needs, is more Attack Signal Intelligence,” here.