We released our Machine Learning Engine PRISM in our most recent 4.2 release. The first capability that we developed from PRISM is our Automated Responder Knowledge (ARK). This capability will change the way incident responders and SOC analysts respond to incidents, and how they share and transfer their entire knowledge to the rest of the team. The key to this capability is that it learns from your own analyst’s responses to historical incidents to guide the response to new ones.
We are not re-inventing the wheel with this feature. SOC and Incident Response teams have been doing this the old-fashioned way for a long time – through 6-12 months training. What we’re doing is providing a GPS and Satellite Navigation, guiding the wheel and giving you different paths to choose from according to the terrain you are in.
We do this by analyzing incidents and their associated attributes and observables, to work out how closely they are related. Then we can suggest actions and playbooks based on your organizations’ historical responses to similar threats and incidents.
Using Automated Responder Knowledge (ARK) in IncMan
Step 1: Not really a step – as it’s done automatically by Automated Responder Knowledge (ARK), but this occurs in the background for every incoming incident. Every Incident possesses a feature space1 that contains all the information related to it, composed of every attribute, associated observable and attached evidence. ARK analyses the feature spaces associated with every incident ever resolved. When a new incident is opened, it is scored and ranked and then compared by ARK to the historical model to identify related incidents or actions based on similar and shared attributes. The weighting of the ranking can be customized by analysts.
Step 2: Open the incident, selecting the applicable incident type. To save time, you can create an incident template to prepopulate some of the contexts automatically in future.
Step 3: Select Playbooks, and PRISM.
In the next screen, you will see a variety of suggested related actions and related incidents based on the feature space that your incident type is matched with. The slider at the top is used to determine the weighting in ranking for actions that are suggested. For example, if I move the slider to the left, the entire feature space actions appear, then if I move the slider to the far-right only a few actions appear from highly ranked incidents.
Step 4: Determine which automation and actions you want to use from the suggestions. After saving, you will be presented with options such as Auto-Commit, Auto-Run, Skip Enrichment, Containment, Notification or Custom Actions. You have the ability to select only the actions you want to automate. If you are concerned about running containment automatically, for example, you just deselect those options.
Step 5: The automated actions are executed, resolving the incident, based on prior machine-learning generated automated responder knowledge.
Attackers spend a considerable amount of time conducting reconnaissance on compromised networks to gain the information that they need to complete their objectives for criminal activity, including fraud and intellectual property theft. Dwell time, the amount of time an attacker is present in an enterprise is currently measured in the hundreds of days.
One of the most effective technologies available to incident response teams to help to reduce the threat actor dwell time and limit the loss of confidential data and damage, are Security Automation and Orchestration platforms. Security Automation and Orchestration technologies process alerts and correlates these with threat actors’ Tactics, Techniques, and Procedures. The ability to determine not only the initial ingress point of the attacker but any lateral movement inside the enterprise significantly reduces the time to deploy containment actions. In this scenario, the incident correlation engine is utilized not only as a mechanism for responding and orchestrating the response but also to proactively search for related IoC’s and artefacts. The synergy of response, automation and correlation provide organizations with a holistic approach to reducing cyber incident dwell time. In more mature organizations, these measures are leveraged frequently by IR responders to transition from being threat gatherers to threat hunters.
When Incident correlation is available within the SAO platform, cyber threat dwell time is reduced through 3 separate but complementary capabilities:
- Category based correlation – Correlating incidents by type.
- Asset based correlation – Contextualizing the criticality and function of an asset
- Temporal correlation -Providing insight into suspicious activity or anomalous access
Defense in Depth strategies is designed so that high-value targets, such as privileged accounts, are monitored for increased or suspicious activity (Marcu et al. 5). The incident correlation engine not only visualizes this but also provides information to help determine the source of an incident by identifying the points of entry into the affected infrastructure.
“Patient Zero” identification is accomplished through tracking the movement from a source to an end user, and assists responders in determining the epidemiology of the attack, and also possible intruder motives. The correlation engine can achieve this objective through correlating similar TTP amongst incidents and visualizing associational link analysis between hosts. This comparison produces a topology of the lateral movement and can easily identify and visualize the path of an intrusion and the nature of an attack. This permits incident responders to initiate containment actions in real time, as the intentions and objectives of hackers are readily determined.
Dwell time of cyber threats can be significantly reduced from the industry average length, currently measured in the 100s of days, to only a few hours by providing a system capable of identifying not only the magnitude of the attack but by providing a roadmap to successfully hunt the incident genesis point to prevent further proliferation.
In this short blog series, I will be discussing and discussing IncMan management features to demonstrate some of the power user functions in our most recent IncMan 188.8.131.52 SP release. Today we will be focusing on how to use the queues feature in IncMan. This functionality has been designed for a SOC team that manages large volumes of incidents with a flexible assignment schedule. This is typically used by SOC’s with a large amount of alerts and incidents, Managed Service Solution Providers and Managed Detect and Response Providers.
- Let’s begin by navigating to “General Settings” which is found in the Settings section.
- Select the section titled “Queue Settings”. Add a new queue by clicking the “+” symbol. The queue will need an email address. This will be used to email the relevant group of users when this incident type is selected.
- Now create a queue name and add the required mailing list for this queue. Click save.
- Navigate to the incident view to start using this queue. Select the Tree Options in the top right of the incident list.
- You will see the new queue that we have created “My New Queue”, in this example. For this queue to become visible, please add it to the selected items list by clicking on “My New Queue”
- The new queue will now be available for usage. See below:
- When you create incidents or update your incident templates you will be able to select this new queue option, expand the queue to see the incidents assigned to it, or be able to click on the queue to show an overview of associated incidents.
In this blog series, I will be discussing DFLabs IncMan management features to highlight the really powerful capabilities that have become available to IncMan users as part of our latest 184.108.40.206 SP release:
Today we focus on the creation of user groups. This useful feature allows the creation of groups of related users, for example, Tier 1 analysts or IT Operations teams. The benefit of this is that a defined group can be assigned specific tasks. This could be for a variety of different reasons:
- To assign a task or incident that require a specific skill set
- To assign task or incident to a specific stakeholder group for review or further investigation.
- To notify specific stakeholders about an incident or investigation
- To escalate an incident to the next tier
The Group functionality can be leveraged in many features across IncMan. We will now step through the process of adding a User Group.
- Let’s create a group. You will need administrator privileges and the required group creation permission to do this. Once you have verified this is the case, please head to User Management -> Groups
- In this section, you can view or modify existing groups and create additional groups of your own. Click the ‘+’ symbol above the user list to create a new group.
- Enter the name that you want to use to identify the Group. It is generally a good practice to assign the associated user profiles and general profiles to the group. For this example, we only need the group name, so please complete that.
- You will now be able to see the newly created group. You will also be presented with a number of additional options. For instance, adding users or editing the existing group information.
- Next, lets add users to the group that we have just created. You can select the users you wish to add to the group from the user list. If you have a lot of users, you can use the filter to quickly search for users. Then save and continue.
- Now that we have created our group and added our users we can begin assigning tasks to this group. Let’s head to an incident and into a playbook to start using this.
- Within the incident playbook, we can assign tasks to individual users. As you scroll down now, you will also notice that a new option is available, with the group name that we created.
- Having created our user group, we can assign Ownership and Authorization to our group instead of to a single user.
Last week, Anton Chuvakin from Gartner announced that Augusto Barros and himself are planning to conduct research in Q4 2017 on the topic of Security Orchestration, Automation and Response (SOAR), or Security Automation and Orchestration, depending on which analyst firms’ market designation you follow. At DFLabs we are very excited that Gartner is finally showing our market space some love and will be helping end users to better assess and differentiate SAO offerings.
Anton provided many questions that he wanted SAO vendors to prepare for. The questions immediately piqued our interest, with one question, in particular, standing out to us.
1.When is SOAR a MUST have technology? What has to be true about the organization to truly require SOAR? Why your best customer acquired the tools?
Anton also said that he had one main problem with Security Automation and Orchestration. In his own words, “For now, my main problem with SOAR (however you call those security orchestration and automation tools…if you say SOAPA or SAO we won’t hate you much) is that I have never (NEVER!) met anybody who thought “my SOAR is a MUST HAVE.”
The question is not entirely unwarranted. During my own time at Gartner covering the SOAR space, I spoke to many clients who were seeking an SAO solution without knowing that they were. Typical comments were, “I have too many alerts and false positives to be able to deal with them all”, or “We are struggling to hire enough skilled people to be able to respond to all of the incidents that we have to manage”. Another common comment was, “I am struggling to report operational performance to my executives?”. Often, these comments were followed by the question, “Do you know of any technology that can help?”.
Typically, these organizations had a mature security monitoring program, usually built around a SIEM. They often had critical drivers, such as regulatory requirements, or held sensitive customer data. We hear the same buying drivers from our own customer base.
To sum up the most common drivers for someone asking about Security Automation and Orchestration:
- A high volume of alerts and incidents and the challenge in managing them
- A large portfolio of diverse 3rd party security detection products resulting in a large volume of alerts
- Regulatory mandates for incident response and breach notification
- An overstretched security operations team
- Reporting risk and the operational performance of the CSIRT and SOC to an executive audience
One interesting thing is that when there is no external driver like regulatory compliance, deploying a Security Automation and Orchestration solution is often determined by maturity. Most organizations don’t realize that they will be unable to cope with the volume of alerts and the resulting alert fatigue until they have deployed a SIEM and a full advanced threat detection architecture.
The common misconception is that the SIEM can help to reduce the number of incoming alerts by applying correlation rules. This not entirely untrue, but correlation rules will only reduce a small percentage. They are essentially signature based. You need to know in advance what you want to correlate, and adding a correlation rule to cover all and every incoming alert is not a trivial task. Even with correlation rules, additional work will be required to qualify an incident. Gathering additional IoC’s, incident observables and context is still a very manual process. Lastly, detection is only one part of the entire incident response process – notifying stakeholders, gathering forensic evidence and threat containment will also have to be done manually. These are the areas where SAO solutions provide the greatest ROI – as a force multiplier.
Threat actors are increasingly adopting security automation and machine learning – security teams will have to follow suit, or risk falling behind.
Many organizations still conduct incident response based on manual processes. Many playbooks that we have seen in our customer base, for example, hand off to other stakeholders within the organization to wait for additional forensic data, and to execute remediation and containment actions.
While this may seem like good practice to avoid inadvertent negative consequences such as accidentally shutting down critical systems or locking out innocent users, it also means that many attacks are not contained in a sufficiently short time to avoid the worst of their consequences.
Manual Processes Cannot Compete with Automation
Reports are mounting about threat actors and hackers leveraging security automation and machine learning to increase the scale and volume, as well as the velocity of attacks. The implications for organizations should be cause for concern, considering that we have been challenged to effectively respond to less sophisticated attacks in the past.
Ransomware is a case in point. In its most simple form, a ransomware attack does not require the full cyber kill chain to be successful. A user receives an email attachment, executes it, the data is encrypted and the damage is done. At that point, incident response turns into disaster recovery.
Automated attacks have been with us for a long time. Worms and Autorooters have been around since the beginning of hacking, with WannaCry and its worming capability only the most recent example. But these have only automated some aspects of the attack, still permitting timely and successful threat containment further along the kill chain.
Threat actors have also leveraged automated command and control infrastructure for many years. DDoS Zombie Botnets, for example, are almost fully automated. To sum it up, the bad guys have automated, the defenders have not. Manual processes cannot compete with automation.
With the increase in the adoption of automation and machine learning by cyber criminals, enterprises will find that they will have to automate as well. The future mantra will be “Automate or Die”.
Making the Cure More Palatable Than the Disease
But automating containment actions is still a challenging topic. Here at DFLabs we still encounter a lot of resistance to the idea by our customers. Security teams understand that the escalating sophistication and velocity of cyber-attacks means that they must become more agile to rapidly respond to cyber incidents. But the risk of detrimentally impacting operations means that they are reluctant to do so, and rarely have the political backing and clout even if they want to.
Security teams will find themselves having to rationalize the automation of incident response to other stakeholders in their organization more and more in the future. This will require being able to build a business case to justify the risk of automating containment. They will have to explain why the cure is not worse than the disease.
There are three questions that are decisive in evaluating whether to automate containment actions:
- How reliable are the detection and identification?
- What is the potential detrimental impact if the automation goes wrong?
- What is the potential risk if this is not automated?
Our approach at DFLabs to this is to carefully evaluate what to automate, and how to do this safely. We support organizations in selectively applying automation through our R3 Rapid Response Runbooks. Incident Responders can apply dual-mode actions that combine manual, semi-automated and fully automated steps to provide granular control over what is automated. R3 Runbooks can also include conditional statements that apply full automation when it is safe to do so but request that a human vet’s the decision in critical environments or where it may have a detrimental impact on operational integrity.
We just released a whitepaper, “Automate or Die, without Dying”, by our Vice President of Product Evangelism and former Gartner analyst, Oliver Rochford, that discusses best practices to safely approach automation. Download the whitepaper here for an in-depth discussion on this controversial and challenging, but important topic.
Let me start by saying that total prevention is not attainable with today’s technology. Whether through negligence or ignorance, any data stored on a network is subject to unauthorized access by 3rd parties. Instead, we must combine Prevention with Detect and Respond. We know we are going to get breached, so we must focus on the how we deal with that.
One significant activity that can improve cyber incident response and enable the timely mitigation of threats is the transfer of knowledge after an incident as part of a formalized “Lessons Learned” phase of the incident response life cycle. Integrating successful processes and procedures from previously successful incident response activities can play a critical role in determining whether a business will suffer in terms of operational integrity, reputation and legal liability. A publicized security breach will lower customer confidence in the services offered by an organization as well as call into question the safety of their sensitive 3rd party information. This impacts a business credibility and translates directly into lost revenue.
In regulated industries, increased regulatory scrutiny is an additional consequence of a breach. This involves evaluating if the tools and procedures used in responding to security threats were sufficient. Integrating lessons learned into existing and future incident response playbooks ensures that the proper technologies and processes are deployed, and avoids accusations of gross negligence, expensive and time-consuming investigations and regulatory demands.
Procedural improvements can be incorporated into incident workflows via incident playbooks and ensure that all stages of the incident response process have been acknowledged and addressed. It also ensures that required security measures and procedures are documented and relevant stakeholders informed of their roles in case of an incident.
This process can be augmented through machine learning. Applying machine learning to this problem requires that all relevant data associated with incidents are analyzed and automatically applied to future incidents. DFLabs recently released DF-ARK machine learning capability to do precisely this. Our patent-pending Automated Responder Knowledge (DF-ARK) module applies machine learning to historical responses to threats and recommends relevant runbooks and paths of action to manage and mitigate them. DF-ARK requires sufficient training data – it begins with no knowledge, but learns from the experience and actions of your security team, becoming more effective over time. DF-ARK implements supervised case-based reasoning machine learning.
It also involves combining automated workflows and manual procedures to keep a human in the loop. This can be constantly improved by applying new observations and data, to fine tune existing methods and procedures identified in the lessons learned phase.
IncMan offers the R3 Rapid Response Runbook engine and Dual Mode playbooks to facilitate this. R3 Runbooks are created using a visual editor and support granular, stateful and conditional workflows to orchestrate and automate incident response activities such as incident triage, stakeholder notification, data and context enrichment and threat containment. Dual Mode Playbooks support manual, semi-automated and automated actions, meaning that users can automate the action without automating the decision.
Adding all of this together, here are 5 best practices for increasing the effectiveness of incident response via lessons learned:
- Encourage feedback from responders at every level. First, second and third line SOC operators and incident handlers each have a unique perspective that must be incorporated into future response playbooks.
- Review all relevant documentation to ensure compliance. This includes organizational policies or regulatory mandates to ensure any disparities are addressed in future playbooks.
- Chronicle any unanticipated or unusual events to extend procedures to mitigate similar occurrences in the future
- Annotate enhancements to existing processes that were identified during the incident response cycle.
- Designate a business unit or individual to be responsible for making necessary changes to existing playbooks, processes or procedures and to distribute these to stakeholders.
Capitalizing on lessons learned during incident response provides immediate and long-term benefits that contribute crucial time savings necessary to successfully mitigate future threats. Deploying a platform designed to facilitate the rapid inclusion of identified improvements to the incident workflow, such as DFLabs’ IncMan, can not only reduce the time it takes to fully investigate an incident but also reduces the overheads required to do so. If you want more information please contact us at DFLabs for a no obligation demonstration of exactly how we can improve your response time, workflows and remediation activities.
A collaborative environment between IT and security groups is critical. The number of cyber security incidents currently impacting networks and customers is increasing exponentially and mitigating security incidents and risks is more complex than ever before. Timely and effective communication are keys to improved collaboration between all parties involved in the cyber incident response process. One of the simplest and most effective methods to improve communication between all relevant IT and security groups is to deploy a common, shared platform where stakeholders can review and analyze incidents across the entire cyber landscape. A cross-departmental platform enables them to focus on correlating cyber incidents and risks with contextual information relevant to their role and responsibilities plays a significant part in organizational success in this regard.
Incorporating knowledge transfer between disparate business entities often separated both geographically and functionally is essential to facilitate a better understanding of the current IT and security challenges. The preferred method to provide this collaborative environment is via electronic based communication mediums and devices. To tie all of these channels together, an organization should consider deploying a cyber incident response platform, and the platform must be able to integrate these technologies, be it SMS, email or other messaging medium, to cover the broadest range of communication channels to transmit critical information to stake holders.
Another successful strategy that focuses on effectively communicating timely, critical information to relevant stakeholders is via the creation of an incident notification group. IncMan supports the creation of groups of Watchers that are appraised of incidents and activities automatically via SMS, email or an integrated communications system. A Watcher group can ensure that information is properly communicated to the appropriate stakeholder(s). This provides differing stakeholders with the capability of monitoring incidents that may impact business continuity. Additionally, IncMan has integrated communications capabilities comply with industry best practices which recommend having a separate, secure and hardened communications channel if email or other internal communication channels are compromised. This independent messaging capability also provides additional benefits such as asymmetric encryption capabilities.
Leveraging a dedicated solution that can orchestrate the communications to stakeholders standardizes the process of cyber incident response and mitigation and is the key to ensuring a more effective response. If you would like more information or a free no obligation demonstration of how IncMan from DFLabs can more effectively automate and orchestrate your incidents please contact us at [email protected]
Alert fatigue is the desensitization when overwhelmed with too much information. The constant repetition and sheer volume of redundant information are painful and arduous but sadly often constitutes the daily reality for many people working in cyber security. Mike Fowler (DFLabs’ VP of Professional Services) discusses several best practices to help with some of the challenges involved in this in his recent whitepaper “DFLabs as a Force Multiplier in Incident Response”. I am going to discuss another one, but looking at it from a slightly different angle.
Imagine the scenario where we have tens of thousands of alerts. Visualize these as Jigsaw pieces with a multitude of different shapes, sizes and colors and the additional dimension of different states. We have alerts from a firewall, anomalies from behavioral analytics, authentication attempts, data source retrieval attempts or policy violations. Now, there are a lot of ways to shift through this information, for example by using a SIEM’s to correlate the data and reduce the some of the alerts. The SIEM could identify and cross-reference the colors and shapes of the jigsaw pieces so to speak.
The next question once that I’ve got the all the pieces I need for the puzzle is how do I put this together? How do I complete the puzzle and unlock the picture?
The “what does the jigsaw picture?” question is something that will often puzzle the responders, pun intended. How do you prioritise and escalate incidents to the correct stakeholders? How do you apply the correct playbook for a specific scenario? How do you know which pieces of information to analyse to fit the jigsaw pieces together and make sure the puzzle looks correct?
Automation process can speed up putting that puzzle together, but making sure you automate the right things is just as critical. If skilled staff are running search queries that are menial, repetitive and require little cognitive skill to execute, you should ask yourself why they are performing these and not instead focused on analyzing the puzzle pieces to figure out how they fit together?
Remove the menial tasks. Allow automation to do the heavy lifting so your teams are not only empowered by the right information they need to successfully manage the response to an incident but also to give them more time to figure out the why, how and what of the threat.
We also welcome you to join us for a webinar hosted by Mike Fowler on this topic on the 6th of September.
Over the past few months during the post-hoc analysis of WannaCry-Petya, we have spoken in great lengths about what should have been done during the incident. This is quite a tricky thing to do in a balanced way because we are all clever in hindsight. What hasn’t been spoken about enough is understanding more generally what we need to do when things go wrong.
This question isn’t as simple as it appears, as there are a lot of aspects to consider during an incident, and only a brief window to identify, contain and mitigate a threat. Let’s look at just a few of these:
– Response times
This is often the greatest challenge but of utmost importance. The response is not only understanding the “how” and “why” of a threat but is also about putting the chain of events into action to make sure that the “what” doesn’t spiral out of control.
– Creating an effective playbook
A playbook should be a guide on how your incident response plan must be executed. Orchestration platforms contain these playbooks/runbooks. Also, note that these are not generic plug and forget policies. They need to be optimized and mapped to your business and regulatory requirements and are often unique to your organization. Otherwise, the incident will be controlled by an incorrect playbook.
– Skills and tool availability
Do you have the correct skills and tools available and are you able to leverage these. Do you understand where your security gaps are and do you know how to mitigate them?
On paper, incident response always works. Right until the moment of truth during a data breach that shows that it doesn’t. To avoid relying on theory only, it is best to run breach simulations and simulate some of the attacks that may affect your organization to find out if your processes and playbooks also work under more realistic conditions.
We’re always playing catch‒up for many reasons—new technologies, new vulnerabilities, and new threats. Software and hardware may possibly always be at the mercy of hackers, criminal actors and other threat actors, so prevention alone is futile. We have to become more resilient and better at dealing with the aftermath of an attack.
The key summary for me is this: How do you respond? Can the response be improved? Utilize the lessons learned in breach simulations to understand how you make the response better than before.