Not so long ago we used to hear about a cyber-attack or a new form of vulnerability in the news perhaps on a quarterly or monthly basis. Today, they are becoming increasingly more frequent and I don’t think a day goes by that we don’t read in the headlines about the consequences an organization is having to face, due to another attack. McAfee recently reported a staggering eight new cyber threats a second in Q4 2017.
With the sophistication of attacks also continuously evolving, the modern CISO is now facing up to the fact and preparing for a “when it will happen” scenario as opposed to “if it will happen”, as cyber incidents become more inevitable. Based on this, their cybersecurity strategy is being turned on its head and instead of focusing more on how to prevent an incident from occurring in the first place, they are now heavily investing in technologies and solutions to help identify, manage and contain an incident, in order to minimize the impact to the organization when it does occur.
In larger enterprises today, it is common to have a Security Operations Center (SOC) and/or a Computer Security Incident Response Team (CSIRT) to monitor, manage and respond to incoming security alerts, but with this, there are numerous challenges that are continuously being faced. Our recent blog “How to Implement Incident Response Automation the Right Way” specifically addressed the challenge of increasing volumes of alerts, resulting in an exponential volume of mundane tasks and discussed how utilizing automation should be implemented to overcome this. In reality, the number of challenges is probably many more than what we will cover in this blog, but here are our top five, which we believe are currently having the biggest effect on SOCs and CSIRTs today.
Top 5 Challenges Faced by Security Operations Centers
1. Increasing Volumes of Security Alerts
With the snowballing number of security alerts being received, valuable analyst time is being consumed sorting through a plethora of security alerts. Most commonly, time is wasted performing a multitude of mundane tasks to triage and determine the veracity of the alerts, often resulting in alerts being missed or those of more damaging consequences slipping through the net as they are overlooked. As you can probably imagine, analysts time would be better spent working on the more sophisticated alerts that need human intervention, as well as proactively threat hunting, in order to minimize the time from breach discovery to resolution.
2. Management of Numerous Security Tools
As a wider range of security suites are being adopted by SOCs and CSIRTs, it is becoming ever more difficult to effectively monitor all of the data being generated from the multiplying number of data points and sources. A typical security operations center may use a combination of 20 or more technologies, which understandably can be difficult to monitor and manage individually. It is therefore important to be able to have a central source and single platform to summarize all of the information as it is being generated and to be able to have a helicopter view of your overall security environment to manage, monitor, and measure security operations and incident response processes effectively.
3. Competition for Skilled Analysts and Lack of Knowledge Transfer Between Analysts
With the global cybersecurity talent shortage to hit 1.2m by 2020 and to increase to 1.8m by 2022, the pool of suitable analysts will only continue to diminish over time, with the level of competition becoming more fierce for analysts that have the required skill set. As with most companies and industries, workforce comes and goes, but knowledge transfer is particularly important within a security operations center and incident response teams, in order to ensure the correct response and process takes place within the minimal amount of time, reducing the time to incident detection and time to incident resolution. This lack of knowledge transfer can inevitably lead to increased response times and wasted resources.
4. Budget Constraints with Security Incidents Becoming More Costly
As within most organizations large or small alike, budgets are always restricted in some way, shape or form. In order to authorize spending, a clear positive ROI usually needs to be forecast and/or proven. Security operations and incident response are notoriously difficult to measure, monitor and manage, (why not read our recent whitepaper entitled “KPIs for Security Operations and Incident Response” to learn more), so justifying spend is always difficult. With the increasing number of cyber-attacks, organizations are increasing the level of investment in cyber security tools, but what level of spending is necessary and what amount outweighs the benefits it will achieve? Can you put a price on the consequences of a potential incident such as a data breach, knowing you will likely face a hefty fine, as well as brand and reputation damage?
5. Legal and Regulatory Compliance
Meeting a growing number of legal and regulatory compliance such as NIST, PCI, GLBA, FISMA, HITECH (HIPPA) and GDPR to name a few, as well as industry best practices, will inherently have an impact on any organization, but can have a heavy bearing depending on the specific industry or geographical location. Using the example of the upcoming Global Data Protection Regulation, taking effect on May 25, 2018, it is even more important for security operations centers to have mandatory processes and procedures clearly in place which are conducted in a legally and policy-compliant manner. Providing sufficient incident reporting and breach notification within the required parameters (in the case of GDPR to notify the supervisory authority within 72 hrs of a breach) is going to be key, or the legal, financial and reputational impact and repercussions could be significant.
Based on these five challenges alone, enterprise SOCs and CSIRTs are struggling to remain efficient and effective and are increasingly being forced to do more with less, while striving to keep up with the current threat landscape and a plethora of security alerts.
With security incidents becoming more costly, enterprises need to find new ways to further reduce the mean time to detection and resolution. As a result, security and risk management leaders will see the business need to invest in Security Orchestration, Automation and Response (SOAR) technology and tools, such as the IncMan SOAR platform from DFLabs, to help improve their security operations proficiency, efficacy, and quality, in order to keep their cyber incident under control.
If you are interested in reading more about how SOAR technology can help to address these challenges in more detail, look out for our future blog on the topic coming soon.
One of the most pressing challenges facing cyber security professionals nowadays is probably the sheer number of security incident alerts, which is becoming too high to cope with even for the most expansive and well-equipped security teams. The increased number of alerts is a result of two factors at play, with the exponential boost in cyber attacks in recent years being the more obvious and straightforward one, the other is certainly much more complex and might also seem a bit ironic and surprising, as it arises from the growing use of different tools and devices within an organization, whose original function is to detect and mitigate incidents in the first place.
Security Operations Centers (SOCs) are now utilizing more devices designed to alert security analysts of cyber attacks than ever before, with the side-effect being too many alerts for the security teams to handle. Consequently, some of the most credible threats go by undetected or are simply not acted upon.
Addressing the Threat Noise Issue
With so many systems monitoring potential security threats and incidents creating alerts, and also taking into consideration that in many cases SOCs are severely understaffed, it comes as no surprise that analysts have a hard time staying on top of every single alert and responding to them appropriately and in a timely fashion. Since they don’t have the time or sufficient human resources to handle all alerts, SOCs often choose to disregard some and try to focus on those they deem to be credible, which understandably can lead to real threats slipping through the cracks and inflicting serious and irreparable damage to organizations.
In an effort to address the issue of threat noise, some SOCs opt for either reducing the number of devices generating alerts or expanding their number of staff, but while seemingly simple and straightforward, these options can be both counterproductive and quite costly. However, these are not the only possible solutions to this challenge standing at the disposal of SOCs, as there is another alternative, which would neither allow alerts to go undetected, nor require hiring additional security analysts.
Automating the Most Time-Consuming Parts of the Process
While the number of alerts generated by monitoring devices in some cases doesn’t necessarily have to be a reason for concern for SOCs in itself, the fact that alerts take a significant amount of time to analyze and handle efficiently often makes them an insurmountable challenge for understaffed security teams. One potentially very promising tactics to tackle this challenge effectively, is by enabling an automated response to some specific types of alerts, in an approach that is thought to be able to yield a wide range of benefits to organizations.
The idea is to automate the routine tasks that are repetitive and that do not require a lot of human expertise, but do usually take a lot of time to respond to and handle. By automating the response to these types of alerts, SOC analysts get more time to handle the alerts that pose a greater risk to their organizations, which must be analyzed in a more focused and comprehensive manner.
As noted in a recent SANS Spotlight paper titled “SOC Automation – Disaster or Deliverance”, written by Eric Cole: “The rate at which organizations are attacked is increasing, as is the speed at which those attacks compromise a network – and it is not possible for a human to keep up with the speed of a computer. The only way to beat a computer is with a computer”.
However, it must be noted that the implementation of incident response automation itself brings a certain degree of risk to organizations, as it might produce false positives, with analysts not being able to determine whether specific alerts are legitimate threats or not. This means that if automation is not properly implemented with predetermined processes and procedures in place, they may end up spending much of their time analyzing alerts that aren’t actual attacks and don’t pose any foreseeable danger. Having said that, organizations should not shy away from automation because of these potential drawbacks, but should instead implement it in a balanced and well thought out manner. The key is to manage and control false positives as oppose to simply eliminating them. It is therefore important to only automate the low-risk alerts that are not expected to have a major impact on an organization and leave the more serious threats to be handled by security professionals who can apply their expertise to resolve them.
When deciding whether to adopt automation or not, organizations need to be aware of its pros and cons, and if this assessment is carried out correctly, they will inevitably realize that the advantages of this approach clearly outweigh the disadvantages, that can also be easily controlled and managed to minimize any potential negative impact.
Looking at the pros and cons of automation, it’s easy to see that the most important benefit is the fact that it allows SOCs to monitor and analyze many more incidents than doing it manually, opening up the security team’s bandwidth to focus on the high-risk and high-impact alerts. Other key benefits also include: a more consistent response to alerts and tickets, a higher volume of ticket closure and response to incidents, as well as coverage of a larger area and larger number of tickets. On the other hand, automation can yield false positives that for their part can lead to directing time and resources towards resolving alerts that are not legitimate attacks, consequently leading to organizations potentially shutting down operations, having an impact on their business and their bottom line.
All said and done, automated incident response has the potential to bring significant benefits to organizations, provided that it’s implemented properly and cautiously, with a well-thought out strategy. Overall it should be a serious consideration for any SOC that has to handle large volumes of alerts on a daily basis.
For further information on SOC automation, read the recent SANS Institute Spotlight Paper – “SOC Automaton – Deliverance or Die”:
For many of us around the world February 14th marks St. Valentine’s Day, but for those of us in Europe, this date also marks the beginning of the 100-day countdown to the upcoming enforcement of the General Data Protection Regulation (GDPR).
As most of us are already aware the EU GDPR was adopted in April 2016 and is due to be formally imposed on May 25th, 2018. In a nutshell for those who are not quite so GDPR savvy, the GDPR emphasizes transparency, security, and accountability by data controllers and introduced mandatory Data Protection Impact Assessments (DPIAs) for those organizations involved in high-risk processing. For example, where a new technology is being deployed, where a profiling operation is likely to significantly affect individuals or where there is large-scale monitoring of a publicly accessible area.
Breach Notification Requirements
A DPIA is the process of systematically considering the potential impact allowing organizations to identify potential privacy issues before they arise and come up with a way to mitigate them. In addition, and a highly important aspect for Security Operation Centers (SOCs) and Computer Security Incident Response Teams (CSIRTs) to be fully aware of and responsive to, data processors must implement an internal breach notification process and inform the supervisory authority of a breach within 72 hours. They must also communicate the breach to affected data subjects without due delay or consequently face a penalty of up to EUR 20,000.00 or 4% of worldwide annual turnover for the preceding financial year, whichever is greater.
Incident Response Processes and Best Practices
As the number of breaches has risen and cyber attacks have become more sophisticated, authorities have recognized a need for increased data protection regulation. The number of simultaneous processes required in a typical forensic or Incident Response Scenario has also grown. Processes need to cover a broad spectrum of technologies and use cases must be standardized, and must perform clearly defined, fully documented actions based upon regulatory requirements, international standards and established best practices.
Additionally, context enrichment and threat analysis capabilities must be integrated to facilitate and automate data breach reporting and notification within the timeframe specified by GDPR. Lastly, customized playbooks must be created to permit rapid response to specific incident types, aid in prioritizing tasks, assignment to individual stakeholders, and to formalize, enforce and measure specific workflows.
Incident Response Management with DFLabs IncMan
Having a platform in place to formalize and support these requirements is crucial. DFLabs IncMan provides all the necessary capabilities to facilitate this. Not only do organizations need an Incident Response plan, they must also have a repeatable and scalable process, as this is one of the steps towards compliance with the GDPR’s accountability principle, requiring that organizations demonstrate the ways in which they comply with data protection principles when transacting business. They must also be able to ensure that they will meet the 72-hour breach notification requirement or face a stiff penalty.
Organizations must establish a framework for accountability, as well as a culture of monitoring, reviewing and assessing their data processing procedures to detect, report and investigate any personal data breach. IncMan implements granular and use-case specific incident response procedures with data segregation and critical security control requirements. To enable Incident Response and breach notification in complex organizations and working across different regions, IncMan can be deployed as a multi-tenant solution with granular role-based access.
Cutting Response Time and Accelerating Incident Containment
Automated responses can be executed to save invaluable time and resources and reduce the window from discovery to containment for an incident. Organizations can easily prepare advanced reports from an automatically collected incident and forensic data, and distribute notifications based on granular rules to report a breach and notify affected customers when required to comply with GDPR and avoid a financial penalty.
Finally, the ability to gather and share intelligence from various sources by anonymizing the data to share safely with 3rd party protect the data without inhibiting the investigation. IncMan contains a Knowledge Base module to document playbooks, threat assessment, situational awareness and best practices which could be shared and transferred across the organization.
IncMan and Fulfilling GDPR Requirements
In summary, DFLabs IncMan Security Automation and Orchestration platform fulfills the requirements of GDPR by providing capabilities to automate and prioritize Incident Response through a range of advanced playbooks and runbooks, with related enrichment, containment, and threat analysis tasks. It distributes appropriate notifications and implements an Incident Response plan (IRP) in case of a potential data breach, with formalized, repeatable and enforceable incident response workflows.
IncMan handles different stages of the Incident Response and Breach Notification Process, providing advanced intelligence reporting with appropriate metrics, with the ability to gather or share intelligence with 3rd parties as required.
So, this Valentine’s Day, we hope that you are enjoying a romantic dinner for two, knowing that your SOC and CSIRT, as well as the wider organization, has the necessary incident response and incident management best practices implemented to sufficiently meet the upcoming GDPR requirements in 100 days’ time. If not, speak to one of our representatives to find out more.
According to Verizon’s Data Breach Investigations report 2017, social engineering was a factor in 43% of breaches, with Phishing accounting for 93% of social attacks.
Our premise is that an incident appears to be a Spear Phishing attempt has been forwarded to the SOC. The SOC team must qualify the incident and determine what needs to be done to mitigate the attack.
We begin our investigation with an incident observable, a fully qualified domain name (FQDN).
We will correlate the FQDN with several external threat intelligence services to assess whether this is truly an ongoing Phishing attempt or a benign false positive. We have used VirusTotal and Cisco Umbrella in this example, but other threat intelligence and malware services could be used instead.
We have 3 different potential outcomes and associated decision paths:
The R3 Runbook
1. The FQDN is automatically extracted from the incident alert and then sent to Cisco Umbrella Investigate for a classification.
2. Depending on the outcome – whether Cisco Umbrella Investigate classifies the FQDN as benign or malicious – we can take one of two different paths.
3. The FQDN will be rechecked with VirusTotal to verify the result. We do this whether the first classification was malicious or benign. At this point we do not know whether one of the two services is returning a false positive or a false negative, so we do a double check.
4. IF both external 3rd party queries confirm that the FQDN is malicious, we have a high degree of certainty that this is a harmful Phishing attempt and can step through automatically to containment. In our example, we automatically block the domain on a web gateway.
5. Alternatively, if only one of the two queries returns a malicious classification, we need to hand the runbook off to a security analyst to conduct a manual investigation. At this point, we cannot determine in an automated manner where the misclassification resides. It could be that one of the services has stale data, or doesn’t include the FQDN in its database. With the ambiguous result, we lack the degree of confidence in the detection to trust executing fully automated containment.
6. If both VirusTotal and Cisco Umbrella Investigate return a non-malicious classification, no further action will be necessary at this point. We will notify the relevant users that the incident has been resolved as a false positive and can close the case for now.
This R3 Phishing Runbook demonstrates the flexibility and efficiency of automating incident response . Incident Qualification is automated as much as is feasible but keeps a human in the loop when cognitive skills are required. It only automates containment when the degree of confidence is sufficient. It eliminates false positives without requiring human intervention.
At the heart of incident response, and by extension of Security Automation and Orchestration technologies, resides the Cyber Incident. A typical definition of a cyber security incident is “Any malicious act or suspicious event that compromises or attempts to compromise, or disrupts or tries to disrupt, a critical cyber asset”. Almost everything we do in a SOC or a CSIRT is based on incidents, and there are a variety of potential incident sources, for example:
- Alerts from cyber security detection technologies such as Endpoint Detection & Response or User Entity Behavior Analytics tools
- Alerts from Security Information & Event Management Systems (SIEM)
- Emails from ITSM or case management systems
- Website submissions from internal stakeholders and whistle-blowers
- Phone calls from internal users and external 3rd parties
This diversity of incident sources means that a solid SAO solution must offer a variety of different methods to create incidents. Regulatory frameworks also frequently mandate being able to originate incidents from different sources. DFLabs IncMan offers a rich set of incident creation options.
There are three primary ways to create incidents in IncMan, offering flexibility to accommodate a variety of incident response process requirements and approaches.
Option 1: Automated Incident Creation
We will feature automated incident creation in a more detail in a future post. In the meantime, I will show you the location of this feature.
Select settings menu, then head to the external sources:
You will see that under the external sources option there are 3 options available to use as sources to automate incident creation:
- Incoming events automation, for CEF/Syslog
- Incoming Mail automation, for a monitored email account
- Integrations, for all QIC integration components.
Automating incident creation supports a variety of filters to support a rules-based approach. In addition, it is also possible to create incidents using our SOAP API. Certified 3rd party applications use this mechanism to create incidents within IncMan, for example, Splunk.
Option 2: Manual Incident Creation
Click the incidents menu option, then click the + symbol selecting the incidents screen
Fill out all mandatory fields (these can be defined in the custom fields screen) then step through and complete the incident wizard to create the incident:
Once all relevant fields have been completed, click save and this incident will then appear in the incident view and apart of the queue you assigned in the details screen.
Option 3: Incident creation from source
Select an incident source for the incident you want to create, for example, a Syslog or CEF message, an Email, or a Threat intelligence source (STIX/TAXI, ThreatConnect):
In this screen, you can then convert this source item to an incident, or link the source to an existing incident.
Since I am a new face (or perhaps just a name to most of you) here at DFLabs, I wanted to take a moment to introduce myself before we jump into the topic for today. My name is John Moran and I recently joined the DFLabs team as Senior Product Manager. Prior to joining the DFLabs team, I worked in a variety of roles, including incident response consulting, security operations and law enforcement. While I have many responsibilities at DFLabs, one of my primary roles and the one that I am perhaps most passionate about is ensuring that DFLabs continues to bring you the industry leading security orchestration, automation and response feature that you have come to expect from IncMan. If you have feature requests, suggestions or other comments, good or bad, regarding IncMan, I’d love to hear from you. Please reach out to me at [email protected]. With that out of the way, let’s get to the good stuff…
While reports such as the Verizon DBIR indicate that the increased focus on creating holistic, detect and respond security programs has had a positive impact on reducing the time to detect security incidents, these same reports have also shown that attackers are continuing to evolve. There is still a continuing gap from compromise to detection. what I would like to discuss here instead though, might be described as the opposite problem; overreaction to a perceived security incident, or conducting a full-scale response to a security incident prior to validating that a security incident has indeed occurred.
Please do not misunderstand what I am saying, I will always advocate the “treat it as an incident until you know otherwise” approach to incident response. However, I would also encourage that the response to any security incident should always be a measured response. The incident response process must be rapid and decisive; but just as under-responding to an incident can present serious financial and reputational risks to an organization, so too can over-responding to a potential security incident. As with any other business process, incident response must provide value to an organization. Continued over-response to perceived security incidents will reduce the overall value that incident response provides to an organization, and over time will result in decreased support from management.
Few studies have truly been able to quantify the costs associated with failing to conduct a measured response. A 2015 study by the Ponemon Institute suggests that response to incidents detected based on erroneous or inaccurate malware alerts costs large organizations up to 395 hours-per-week, or almost $1.3 million a year. It is important to note that this study only took into consideration time spent investigating malware alerts. While malware detection technologies have undoubtedly improved in the two years since this study was conducted, most organizations have a variety of detection technologies, all generating alerts which must be investigated. It was assumed by Ponemon that the organizations surveyed were conducting an appropriate, measured response to each of these false positives. With the cost already so high, it is easy to conclude how costly over-responding to incidents can become at scale.
While conducting incident response consulting, I have personally seen organizations spend weeks to months conducting full-scale incident response activities before spending tens of thousands of dollars for incident response consulting, only to find out that the perceived incident was based on faulty information or conclusions. So how do you minimize the risk of over-responding while continuing to ensure that each potential incident is properly investigated? Here are five tips based on my experience:
- Have the right people in place – There is simply no substitute for having the right people in place. While proper training and experience are vital, the qualities of an effective analyst extend beyond these two attributes. It is crucial to have analysts who possess an analytical mindset and can remain level-headed amidst a stressful and dynamic environment. Training and be provided, the experience can be gained, however, some of these less tangible qualities are much harder to learn.
- Have the right toolsets in place – Attempting to substitute tools for skills will inevitably lead to failure. However, it is important to have the proper tools in place to give those highly skilled analysts the information they need to make fact-based conclusions. Even the most highly skilled analysts will inevitably arrive at the wrong conclusion when presented with incomplete or inaccurate information.
- Know the threat landscape – Threat intelligence, and I mean actual intelligence, not just a machine-readable threat feed, can provide much greater context surrounding a potential security incident. Analysts must also be provided the opportunity to remain up-to-date on the ever-changing threat landscape. This can allow decision makers a much more accurate perspective on which to base their initial level of response. Often, it is a lack of knowledge and conclusions based on assumptions that lead to a dramatic over-response.
- Know your limitations – Unless you are fortunate enough to work for a government agency or one of the world’s largest organization, chances are at some point your needs may exceed the scope of your internal capabilities. These limitations are not weaknesses in and of themselves. Instead, the risk here presents itself when an organization fails to realize its limitation and attempts to work outside of those bounds. It is important to know when to consider tapping into external resources such as consulting, incident response retainers and managed services.
- Replace the emotional response with processes and procedures – Even the most highly skilled analysts will approach some potential security incidents with certain biases or preconceived notions. It is essential to implement quality processes and procedures which maximize the analyst’s skills, take full advantage of the available tools, and guide the incident response process. Processes and procedures surrounding incident validation, incident classification and initial resource allocation can ensure that the process stays on track and avoid straying down the wrong, costly road.
The most important goal of any security program must always remain to never under-respond to an incident. However, integrating these five tips into your security program will undoubtedly provide a better, more efficient process to determine what the appropriate level of response to each potential security incident should be, greatly reducing the risk of over-responding.
We released our Machine Learning Engine PRISM in our most recent 4.2 release. The first capability that we developed from PRISM is our Automated Responder Knowledge (ARK). This capability will change the way incident responders and SOC analysts respond to incidents, and how they share and transfer their entire knowledge to the rest of the team. The key to this capability is that it learns from your own analyst’s responses to historical incidents to guide the response to new ones.
We are not re-inventing the wheel with this feature. SOC and Incident Response teams have been doing this the old-fashioned way for a long time – through 6-12 months training. What we’re doing is providing a GPS and Satellite Navigation, guiding the wheel and giving you different paths to choose from according to the terrain you are in.
We do this by analyzing incidents and their associated attributes and observables, to work out how closely they are related. Then we can suggest actions and playbooks based on your organizations’ historical responses to similar threats and incidents.
Using Automated Responder Knowledge (ARK) in IncMan
Step 1: Not really a step – as it’s done automatically by Automated Responder Knowledge (ARK), but this occurs in the background for every incoming incident. Every Incident possesses a feature space1 that contains all the information related to it, composed of every attribute, associated observable and attached evidence. ARK analyses the feature spaces associated with every incident ever resolved. When a new incident is opened, it is scored and ranked and then compared by ARK to the historical model to identify related incidents or actions based on similar and shared attributes. The weighting of the ranking can be customized by analysts.
Step 2: Open the incident, selecting the applicable incident type. To save time, you can create an incident template to prepopulate some of the contexts automatically in future.
Step 3: Select Playbooks, and PRISM.
In the next screen, you will see a variety of suggested related actions and related incidents based on the feature space that your incident type is matched with. The slider at the top is used to determine the weighting in ranking for actions that are suggested. For example, if I move the slider to the left, the entire feature space actions appear, then if I move the slider to the far-right only a few actions appear from highly ranked incidents.
Step 4: Determine which automation and actions you want to use from the suggestions. After saving, you will be presented with options such as Auto-Commit, Auto-Run, Skip Enrichment, Containment, Notification or Custom Actions. You have the ability to select only the actions you want to automate. If you are concerned about running containment automatically, for example, you just deselect those options.
Step 5: The automated actions are executed, resolving the incident, based on prior machine-learning generated automated responder knowledge.
Last week, Anton Chuvakin from Gartner announced that Augusto Barros and himself are planning to conduct research in Q4 2017 on the topic of Security Orchestration, Automation and Response (SOAR), or Security Automation and Orchestration, depending on which analyst firms’ market designation you follow. At DFLabs we are very excited that Gartner is finally showing our market space some love and will be helping end users to better assess and differentiate SAO offerings.
Anton provided many questions that he wanted SAO vendors to prepare for. The questions immediately piqued our interest, with one question, in particular, standing out to us.
1.When is SOAR a MUST have technology? What has to be true about the organization to truly require SOAR? Why your best customer acquired the tools?
Anton also said that he had one main problem with Security Automation and Orchestration. In his own words, “For now, my main problem with SOAR (however you call those security orchestration and automation tools…if you say SOAPA or SAO we won’t hate you much) is that I have never (NEVER!) met anybody who thought “my SOAR is a MUST HAVE.”
The question is not entirely unwarranted. During my own time at Gartner covering the SOAR space, I spoke to many clients who were seeking an SAO solution without knowing that they were. Typical comments were, “I have too many alerts and false positives to be able to deal with them all”, or “We are struggling to hire enough skilled people to be able to respond to all of the incidents that we have to manage”. Another common comment was, “I am struggling to report operational performance to my executives?”. Often, these comments were followed by the question, “Do you know of any technology that can help?”.
Typically, these organizations had a mature security monitoring program, usually built around a SIEM. They often had critical drivers, such as regulatory requirements, or held sensitive customer data. We hear the same buying drivers from our own customer base.
To sum up the most common drivers for someone asking about Security Automation and Orchestration:
- A high volume of alerts and incidents and the challenge in managing them
- A large portfolio of diverse 3rd party security detection products resulting in a large volume of alerts
- Regulatory mandates for incident response and breach notification
- An overstretched security operations team
- Reporting risk and the operational performance of the CSIRT and SOC to an executive audience
One interesting thing is that when there is no external driver like regulatory compliance, deploying a Security Automation and Orchestration solution is often determined by maturity. Most organizations don’t realize that they will be unable to cope with the volume of alerts and the resulting alert fatigue until they have deployed a SIEM and a full advanced threat detection architecture.
The common misconception is that the SIEM can help to reduce the number of incoming alerts by applying correlation rules. This not entirely untrue, but correlation rules will only reduce a small percentage. They are essentially signature based. You need to know in advance what you want to correlate, and adding a correlation rule to cover all and every incoming alert is not a trivial task. Even with correlation rules, additional work will be required to qualify an incident. Gathering additional IoC’s, incident observables and context is still a very manual process. Lastly, detection is only one part of the entire incident response process – notifying stakeholders, gathering forensic evidence and threat containment will also have to be done manually. These are the areas where SAO solutions provide the greatest ROI – as a force multiplier.
Threat actors are increasingly adopting security automation and machine learning – security teams will have to follow suit, or risk falling behind.
Many organizations still conduct incident response based on manual processes. Many playbooks that we have seen in our customer base, for example, hand off to other stakeholders within the organization to wait for additional forensic data, and to execute remediation and containment actions.
While this may seem like good practice to avoid inadvertent negative consequences such as accidentally shutting down critical systems or locking out innocent users, it also means that many attacks are not contained in a sufficiently short time to avoid the worst of their consequences.
Manual Processes Cannot Compete with Automation
Reports are mounting about threat actors and hackers leveraging security automation and machine learning to increase the scale and volume, as well as the velocity of attacks. The implications for organizations should be cause for concern, considering that we have been challenged to effectively respond to less sophisticated attacks in the past.
Ransomware is a case in point. In its most simple form, a ransomware attack does not require the full cyber kill chain to be successful. A user receives an email attachment, executes it, the data is encrypted and the damage is done. At that point, incident response turns into disaster recovery.
Automated attacks have been with us for a long time. Worms and Autorooters have been around since the beginning of hacking, with WannaCry and its worming capability only the most recent example. But these have only automated some aspects of the attack, still permitting timely and successful threat containment further along the kill chain.
Threat actors have also leveraged automated command and control infrastructure for many years. DDoS Zombie Botnets, for example, are almost fully automated. To sum it up, the bad guys have automated, the defenders have not. Manual processes cannot compete with automation.
With the increase in the adoption of automation and machine learning by cyber criminals, enterprises will find that they will have to automate as well. The future mantra will be “Automate or Die”.
Making the Cure More Palatable Than the Disease
But automating containment actions is still a challenging topic. Here at DFLabs we still encounter a lot of resistance to the idea by our customers. Security teams understand that the escalating sophistication and velocity of cyber-attacks means that they must become more agile to rapidly respond to cyber incidents. But the risk of detrimentally impacting operations means that they are reluctant to do so, and rarely have the political backing and clout even if they want to.
Security teams will find themselves having to rationalize the automation of incident response to other stakeholders in their organization more and more in the future. This will require being able to build a business case to justify the risk of automating containment. They will have to explain why the cure is not worse than the disease.
There are three questions that are decisive in evaluating whether to automate containment actions:
- How reliable are the detection and identification?
- What is the potential detrimental impact if the automation goes wrong?
- What is the potential risk if this is not automated?
Our approach at DFLabs to this is to carefully evaluate what to automate, and how to do this safely. We support organizations in selectively applying automation through our R3 Rapid Response Runbooks. Incident Responders can apply dual-mode actions that combine manual, semi-automated and fully automated steps to provide granular control over what is automated. R3 Runbooks can also include conditional statements that apply full automation when it is safe to do so but request that a human vet’s the decision in critical environments or where it may have a detrimental impact on operational integrity.
We just released a whitepaper, “Automate or Die, without Dying”, by our Vice President of Product Evangelism and former Gartner analyst, Oliver Rochford, that discusses best practices to safely approach automation. Download the whitepaper here for an in-depth discussion on this controversial and challenging, but important topic.
Let me start by saying that total prevention is not attainable with today’s technology. Whether through negligence or ignorance, any data stored on a network is subject to unauthorized access by 3rd parties. Instead, we must combine Prevention with Detect and Respond. We know we are going to get breached, so we must focus on the how we deal with that.
One significant activity that can improve cyber incident response and enable the timely mitigation of threats is the transfer of knowledge after an incident as part of a formalized “Lessons Learned” phase of the incident response life cycle. Integrating successful processes and procedures from previously successful incident response activities can play a critical role in determining whether a business will suffer in terms of operational integrity, reputation and legal liability. A publicized security breach will lower customer confidence in the services offered by an organization as well as call into question the safety of their sensitive 3rd party information. This impacts a business credibility and translates directly into lost revenue.
In regulated industries, increased regulatory scrutiny is an additional consequence of a breach. This involves evaluating if the tools and procedures used in responding to security threats were sufficient. Integrating lessons learned into existing and future incident response playbooks ensures that the proper technologies and processes are deployed, and avoids accusations of gross negligence, expensive and time-consuming investigations and regulatory demands.
Procedural improvements can be incorporated into incident workflows via incident playbooks and ensure that all stages of the incident response process have been acknowledged and addressed. It also ensures that required security measures and procedures are documented and relevant stakeholders informed of their roles in case of an incident.
This process can be augmented through machine learning. Applying machine learning to this problem requires that all relevant data associated with incidents are analyzed and automatically applied to future incidents. DFLabs recently released DF-ARK machine learning capability to do precisely this. Our patent-pending Automated Responder Knowledge (DF-ARK) module applies machine learning to historical responses to threats and recommends relevant runbooks and paths of action to manage and mitigate them. DF-ARK requires sufficient training data – it begins with no knowledge, but learns from the experience and actions of your security team, becoming more effective over time. DF-ARK implements supervised case-based reasoning machine learning.
It also involves combining automated workflows and manual procedures to keep a human in the loop. This can be constantly improved by applying new observations and data, to fine tune existing methods and procedures identified in the lessons learned phase.
IncMan offers the R3 Rapid Response Runbook engine and Dual Mode playbooks to facilitate this. R3 Runbooks are created using a visual editor and support granular, stateful and conditional workflows to orchestrate and automate incident response activities such as incident triage, stakeholder notification, data and context enrichment and threat containment. Dual Mode Playbooks support manual, semi-automated and automated actions, meaning that users can automate the action without automating the decision.
Adding all of this together, here are 5 best practices for increasing the effectiveness of incident response via lessons learned:
- Encourage feedback from responders at every level. First, second and third line SOC operators and incident handlers each have a unique perspective that must be incorporated into future response playbooks.
- Review all relevant documentation to ensure compliance. This includes organizational policies or regulatory mandates to ensure any disparities are addressed in future playbooks.
- Chronicle any unanticipated or unusual events to extend procedures to mitigate similar occurrences in the future
- Annotate enhancements to existing processes that were identified during the incident response cycle.
- Designate a business unit or individual to be responsible for making necessary changes to existing playbooks, processes or procedures and to distribute these to stakeholders.
Capitalizing on lessons learned during incident response provides immediate and long-term benefits that contribute crucial time savings necessary to successfully mitigate future threats. Deploying a platform designed to facilitate the rapid inclusion of identified improvements to the incident workflow, such as DFLabs’ IncMan, can not only reduce the time it takes to fully investigate an incident but also reduces the overheads required to do so. If you want more information please contact us at DFLabs for a no obligation demonstration of exactly how we can improve your response time, workflows and remediation activities.