Whether you call it Incident Management or Incident Handling, most will agree that there is a distinct difference between responding to an incident and managing an incident. Put simply, Incident Response can be defined as the “doing”, while Incident Management can be defined as the “orchestrating”. Proper Incident Management is the foundation and structure upon which a successful Incident Response program must be based. There are numerous blogs, articles and papers addressing various aspects of the differences between Incident Response and Incident Management dating back to at least a decade. Why add another to the top of the pile? Because while most organizations now see the value in putting people, tools, and basic processes in place to respond to the inevitable incident, many still do not take the time to develop a solid Incident Management process to orchestrate the response effort.
Security incidents create a unique environment, highly dynamic and often stressful, and outside the comfort zone of many of those who may be involved in the response process. This is especially true during complex incidents where ancillary team members, such as those from Human Resources, Legal, Compliance or Executive Management, may become involved. These ancillary team members are often accustomed to working in a more structured environment and have had very little previous exposure to the Incident Response process, making Incident Management an even more critical function. Although often overlooked, the lack of effective Incident Management will invariably result in a less efficient and effective process, leading to increased financial and reputational damage from an incident.
Many day-to-day management processes do not adapt well to these complex challenges. For example, as the size and complexity of a security incident increases, the number of people that a single manager can directly supervise effectively decreases. It is also not uncommon for some employees to report to more than one supervisor. During a security incident, this can lead to mixed directives and confusion. During a security incident, it is critical that information flows quickly and smoothly both vertically and horizontally. Many organization’s existing communication methods do not adapt well to this.
When an ad-hoc Incident Management system is used, the response process becomes much less consistent and effective. A common pitfall of this ad-hoc management style is that it can create a flat management structure, forcing the Incident Response Coordinator to directly oversee the functions of many groups with vastly different objectives. A flat structure such as this also tends to inhibit the flow of information between the individual groups.
Another common pitfall of this ad-hoc management style is that it often results in a fragmented and disorganized process. Without proper management to provide clear objectives and expectations, it is easy for individual groups to create their own objectives based on what they believe to be the priority. This seriously limits the effective communication between individual groups, forcing each to work with incomplete or incorrect information.
There are numerous ways in which the Incident Management process can be streamlined. On Wednesday, January 31st, DFLabs will be releasing a new whitepaper titled “Increasing the Effectiveness of Incident Management”, discussing the lessons that can be learned from decades of trial and error in another profession, the fire service, to improve the effectiveness of the Incident Management process. John Moran, Sr. Product Manager at DFLabs, will also be joining Paul and the Enterprise Security Weekly Team on their podcast at 1 PM EST on January 31st to discuss some of these lessons in more detail. Stay tuned to the DFLabs website, or listen in on the podcast on January 31st for more details!
Download the “Increasing the Effectiveness of Incident Management” whitepaper here
At the heart of incident response, and by extension of Security Automation and Orchestration technologies, resides the Cyber Incident. A typical definition of a cyber security incident is “Any malicious act or suspicious event that compromises or attempts to compromise, or disrupts or tries to disrupt, a critical cyber asset”. Almost everything we do in a SOC or a CSIRT is based on incidents, and there are a variety of potential incident sources, for example:
- Alerts from cyber security detection technologies such as Endpoint Detection & Response or User Entity Behavior Analytics tools
- Alerts from Security Information & Event Management Systems (SIEM)
- Emails from ITSM or case management systems
- Website submissions from internal stakeholders and whistle-blowers
- Phone calls from internal users and external 3rd parties
This diversity of incident sources means that a solid SAO solution must offer a variety of different methods to create incidents. Regulatory frameworks also frequently mandate being able to originate incidents from different sources. DFLabs IncMan offers a rich set of incident creation options.
There are three primary ways to create incidents in IncMan, offering flexibility to accommodate a variety of incident response process requirements and approaches.
Option 1: Automated Incident Creation
We will feature automated incident creation in a more detail in a future post. In the meantime, I will show you the location of this feature.
Select settings menu, then head to the external sources:
You will see that under the external sources option there are 3 options available to use as sources to automate incident creation:
- Incoming events automation, for CEF/Syslog
- Incoming Mail automation, for a monitored email account
- Integrations, for all QIC integration components.
Automating incident creation supports a variety of filters to support a rules-based approach. In addition, it is also possible to create incidents using our SOAP API. Certified 3rd party applications use this mechanism to create incidents within IncMan, for example, Splunk.
Option 2: Manual Incident Creation
Click the incidents menu option, then click the + symbol selecting the incidents screen
Fill out all mandatory fields (these can be defined in the custom fields screen) then step through and complete the incident wizard to create the incident:
Once all relevant fields have been completed, click save and this incident will then appear in the incident view and apart of the queue you assigned in the details screen.
Option 3: Incident creation from source
Select an incident source for the incident you want to create, for example, a Syslog or CEF message, an Email, or a Threat intelligence source (STIX/TAXI, ThreatConnect):
In this screen, you can then convert this source item to an incident, or link the source to an existing incident.
Since I am a new face (or perhaps just a name to most of you) here at DFLabs, I wanted to take a moment to introduce myself before we jump into the topic for today. My name is John Moran and I recently joined the DFLabs team as Senior Product Manager. Prior to joining the DFLabs team, I worked in a variety of roles, including incident response consulting, security operations and law enforcement. While I have many responsibilities at DFLabs, one of my primary roles and the one that I am perhaps most passionate about is ensuring that DFLabs continues to bring you the industry leading security orchestration, automation and response feature that you have come to expect from IncMan. If you have feature requests, suggestions or other comments, good or bad, regarding IncMan, I’d love to hear from you. Please reach out to me at [email protected]. With that out of the way, let’s get to the good stuff…
While reports such as the Verizon DBIR indicate that the increased focus on creating holistic, detect and respond security programs has had a positive impact on reducing the time to detect security incidents, these same reports have also shown that attackers are continuing to evolve. There is still a continuing gap from compromise to detection. what I would like to discuss here instead though, might be described as the opposite problem; overreaction to a perceived security incident, or conducting a full-scale response to a security incident prior to validating that a security incident has indeed occurred.
Please do not misunderstand what I am saying, I will always advocate the “treat it as an incident until you know otherwise” approach to incident response. However, I would also encourage that the response to any security incident should always be a measured response. The incident response process must be rapid and decisive; but just as under-responding to an incident can present serious financial and reputational risks to an organization, so too can over-responding to a potential security incident. As with any other business process, incident response must provide value to an organization. Continued over-response to perceived security incidents will reduce the overall value that incident response provides to an organization, and over time will result in decreased support from management.
Few studies have truly been able to quantify the costs associated with failing to conduct a measured response. A 2015 study by the Ponemon Institute suggests that response to incidents detected based on erroneous or inaccurate malware alerts costs large organizations up to 395 hours-per-week, or almost $1.3 million a year. It is important to note that this study only took into consideration time spent investigating malware alerts. While malware detection technologies have undoubtedly improved in the two years since this study was conducted, most organizations have a variety of detection technologies, all generating alerts which must be investigated. It was assumed by Ponemon that the organizations surveyed were conducting an appropriate, measured response to each of these false positives. With the cost already so high, it is easy to conclude how costly over-responding to incidents can become at scale.
While conducting incident response consulting, I have personally seen organizations spend weeks to months conducting full-scale incident response activities before spending tens of thousands of dollars for incident response consulting, only to find out that the perceived incident was based on faulty information or conclusions. So how do you minimize the risk of over-responding while continuing to ensure that each potential incident is properly investigated? Here are five tips based on my experience:
- Have the right people in place – There is simply no substitute for having the right people in place. While proper training and experience are vital, the qualities of an effective analyst extend beyond these two attributes. It is crucial to have analysts who possess an analytical mindset and can remain level-headed amidst a stressful and dynamic environment. Training and be provided, the experience can be gained, however, some of these less tangible qualities are much harder to learn.
- Have the right toolsets in place – Attempting to substitute tools for skills will inevitably lead to failure. However, it is important to have the proper tools in place to give those highly skilled analysts the information they need to make fact-based conclusions. Even the most highly skilled analysts will inevitably arrive at the wrong conclusion when presented with incomplete or inaccurate information.
- Know the threat landscape – Threat intelligence, and I mean actual intelligence, not just a machine-readable threat feed, can provide much greater context surrounding a potential security incident. Analysts must also be provided the opportunity to remain up-to-date on the ever-changing threat landscape. This can allow decision makers a much more accurate perspective on which to base their initial level of response. Often, it is a lack of knowledge and conclusions based on assumptions that lead to a dramatic over-response.
- Know your limitations – Unless you are fortunate enough to work for a government agency or one of the world’s largest organization, chances are at some point your needs may exceed the scope of your internal capabilities. These limitations are not weaknesses in and of themselves. Instead, the risk here presents itself when an organization fails to realize its limitation and attempts to work outside of those bounds. It is important to know when to consider tapping into external resources such as consulting, incident response retainers and managed services.
- Replace the emotional response with processes and procedures – Even the most highly skilled analysts will approach some potential security incidents with certain biases or preconceived notions. It is essential to implement quality processes and procedures which maximize the analyst’s skills, take full advantage of the available tools, and guide the incident response process. Processes and procedures surrounding incident validation, incident classification and initial resource allocation can ensure that the process stays on track and avoid straying down the wrong, costly road.
The most important goal of any security program must always remain to never under-respond to an incident. However, integrating these five tips into your security program will undoubtedly provide a better, more efficient process to determine what the appropriate level of response to each potential security incident should be, greatly reducing the risk of over-responding.
Let me start by saying that total prevention is not attainable with today’s technology. Whether through negligence or ignorance, any data stored on a network is subject to unauthorized access by 3rd parties. Instead, we must combine Prevention with Detect and Respond. We know we are going to get breached, so we must focus on the how we deal with that.
One significant activity that can improve cyber incident response and enable the timely mitigation of threats is the transfer of knowledge after an incident as part of a formalized “Lessons Learned” phase of the incident response life cycle. Integrating successful processes and procedures from previously successful incident response activities can play a critical role in determining whether a business will suffer in terms of operational integrity, reputation and legal liability. A publicized security breach will lower customer confidence in the services offered by an organization as well as call into question the safety of their sensitive 3rd party information. This impacts a business credibility and translates directly into lost revenue.
In regulated industries, increased regulatory scrutiny is an additional consequence of a breach. This involves evaluating if the tools and procedures used in responding to security threats were sufficient. Integrating lessons learned into existing and future incident response playbooks ensures that the proper technologies and processes are deployed, and avoids accusations of gross negligence, expensive and time-consuming investigations and regulatory demands.
Procedural improvements can be incorporated into incident workflows via incident playbooks and ensure that all stages of the incident response process have been acknowledged and addressed. It also ensures that required security measures and procedures are documented and relevant stakeholders informed of their roles in case of an incident.
This process can be augmented through machine learning. Applying machine learning to this problem requires that all relevant data associated with incidents are analyzed and automatically applied to future incidents. DFLabs recently released DF-ARK machine learning capability to do precisely this. Our patent-pending Automated Responder Knowledge (DF-ARK) module applies machine learning to historical responses to threats and recommends relevant runbooks and paths of action to manage and mitigate them. DF-ARK requires sufficient training data – it begins with no knowledge, but learns from the experience and actions of your security team, becoming more effective over time. DF-ARK implements supervised case-based reasoning machine learning.
It also involves combining automated workflows and manual procedures to keep a human in the loop. This can be constantly improved by applying new observations and data, to fine tune existing methods and procedures identified in the lessons learned phase.
IncMan offers the R3 Rapid Response Runbook engine and Dual Mode playbooks to facilitate this. R3 Runbooks are created using a visual editor and support granular, stateful and conditional workflows to orchestrate and automate incident response activities such as incident triage, stakeholder notification, data and context enrichment and threat containment. Dual Mode Playbooks support manual, semi-automated and automated actions, meaning that users can automate the action without automating the decision.
Adding all of this together, here are 5 best practices for increasing the effectiveness of incident response via lessons learned:
- Encourage feedback from responders at every level. First, second and third line SOC operators and incident handlers each have a unique perspective that must be incorporated into future response playbooks.
- Review all relevant documentation to ensure compliance. This includes organizational policies or regulatory mandates to ensure any disparities are addressed in future playbooks.
- Chronicle any unanticipated or unusual events to extend procedures to mitigate similar occurrences in the future
- Annotate enhancements to existing processes that were identified during the incident response cycle.
- Designate a business unit or individual to be responsible for making necessary changes to existing playbooks, processes or procedures and to distribute these to stakeholders.
Capitalizing on lessons learned during incident response provides immediate and long-term benefits that contribute crucial time savings necessary to successfully mitigate future threats. Deploying a platform designed to facilitate the rapid inclusion of identified improvements to the incident workflow, such as DFLabs’ IncMan, can not only reduce the time it takes to fully investigate an incident but also reduces the overheads required to do so. If you want more information please contact us at DFLabs for a no obligation demonstration of exactly how we can improve your response time, workflows and remediation activities.