How to Implement Incident Response Automation the Right Way

One of the most pressing challenges facing cyber security professionals nowadays is probably the sheer number of security incident alerts, which is becoming too high to cope with even for the most expansive and well-equipped security teams. The increased number of alerts is a result of two factors at play, with the exponential boost in cyber attacks in recent years being the more obvious and straightforward one, the other is certainly much more complex and might also seem a bit ironic and surprising, as it arises from the growing use of different tools and devices within an organization, whose original function is to detect and mitigate incidents in the first place.

Security Operations Centers (SOCs) are now utilizing more devices designed to alert security analysts of cyber attacks than ever before, with the side-effect being too many alerts for the security teams to handle. Consequently, some of the most credible threats go by undetected or are simply not acted upon.

Addressing the Threat Noise Issue

With so many systems monitoring potential security threats and incidents creating alerts, and also taking into consideration that in many cases SOCs are severely understaffed, it comes as no surprise that analysts have a hard time staying on top of every single alert and responding to them appropriately and in a timely fashion. Since they don’t have the time or sufficient human resources to handle all alerts, SOCs often choose to disregard some and try to focus on those they deem to be credible, which understandably can lead to real threats slipping through the cracks and inflicting serious and irreparable damage to organizations.

In an effort to address the issue of threat noise, some SOCs opt for either reducing the number of devices generating alerts or expanding their number of staff, but while seemingly simple and straightforward, these options can be both counterproductive and quite costly. However, these are not the only possible solutions to this challenge standing at the disposal of SOCs, as there is another alternative, which would neither allow alerts to go undetected, nor require hiring additional security analysts.

Automating the Most Time-Consuming Parts of the Process

While the number of alerts generated by monitoring devices in some cases doesn’t necessarily have to be a reason for concern for SOCs in itself, the fact that alerts take a significant amount of time to analyze and handle efficiently often makes them an insurmountable challenge for understaffed security teams. One potentially very promising tactics to tackle this challenge effectively, is by enabling an automated response to some specific types of alerts, in an approach that is thought to be able to yield a wide range of benefits to organizations.

The idea is to automate the routine tasks that are repetitive and that do not require a lot of human expertise, but do usually take a lot of time to respond to and handle. By automating the response to these types of alerts, SOC analysts get more time to handle the alerts that pose a greater risk to their organizations, which must be analyzed in a more focused and comprehensive manner.

As noted in a recent SANS Spotlight paper titled “SOC Automation – Disaster or Deliverance”, written by Eric Cole: “The rate at which organizations are attacked is increasing, as is the speed at which those attacks compromise a network – and it is not possible for a human to keep up with the speed of a computer. The only way to beat a computer is with a computer”.

However, it must be noted that the implementation of incident response automation itself brings a certain degree of risk to organizations, as it might produce false positives, with  analysts not being able to determine whether specific alerts are legitimate threats or not. This means that if automation is not properly implemented with predetermined processes and procedures in place, they may end up spending much of their time analyzing alerts that aren’t actual attacks and don’t pose any foreseeable danger. Having said that, organizations should not shy away from automation because of these potential drawbacks, but should instead implement it in a balanced and well thought out manner. The key is to manage and control false positives as oppose to simply eliminating them. It is therefore important to only automate the low-risk alerts that are not expected to have a major impact on an organization and leave the more serious threats to be handled by security professionals who can apply their expertise to resolve them.

When deciding whether to adopt automation or not, organizations need to be aware of its pros and cons, and if this assessment is carried out correctly, they will inevitably realize that the advantages of this approach clearly outweigh the disadvantages, that can also be easily controlled and managed to minimize any potential negative impact.

Looking at the pros and cons of automation, it’s easy to see that the most important benefit is the fact that it allows SOCs to monitor and analyze many more incidents than doing it manually, opening up the security team’s bandwidth to focus on the high-risk and high-impact alerts.  Other key benefits also include: a more consistent response to alerts and tickets, a higher volume of ticket closure and response to incidents, as well as coverage of a larger area and larger number of tickets. On the other hand, automation can yield false positives that for their part can lead to directing time and resources towards resolving alerts that are not legitimate attacks, consequently leading to organizations potentially shutting down operations, having an impact on their business and their bottom line.

All said and done, automated incident response has the potential to bring significant benefits to organizations, provided that it’s implemented properly and cautiously, with a well-thought out strategy.  Overall it should be a serious consideration for any SOC that has to handle large volumes of alerts on a daily basis.

For further information on SOC automation, read the recent SANS Institute Spotlight Paper – “SOC Automaton – Deliverance or Die”:  

3 Best Practices for Incident Categorization to Support Key Performance Indicators

The DNA sequence for each human is 99.5% similar to any other human. Yet when it comes to incident response and the manner in which individual analysts may interpret the details of a given scenario, our near-total similarity seems to all but vanish. Where one analyst might characterize an incident as the result of a successful social engineering attack, another may instead identify it as a generic malware infection. Similarly, a service outage may be labeled as a denial of service by some, while others will choose to attribute the root cause to an improper procedure carried out by a systems administrator. Root cause and impact, or incident outcome, are just a couple of the many considerations that, unless properly accounted for in a case management process, will otherwise play havoc on a security team’s reporting metrics.

Poor Key Performance Indicators can blind decision makers

What is the impact of poor KPI’s? All too often the end result leads to equally poor strategic decisions. Money and effort may be assigned to the wrong measures, for example into more ineffective prevention controls instead of improved response capability. In a worst case scenario, poor KPI’s can blind decision makers to the most pertinent security issues of their enterprise, and the necessary funding for additional security may be withheld altogether.

Three best practices are required to address this all too common problem of attaining accurate reporting:

  1. A coherent incident management process is necessary in order to properly categorize incident activity. Its definitions must be clear, taking into account outliers, clarifying how root causes and impacts are to be tracked, and providing a workflow to assist analysts in accurately and consistently determining incident categorization.
  2. The process must be enforced to guarantee uniform results in support of coherent KPI’s. Training, quality assurance, and reinforcement are all necessary to ensure total stakeholder buy-in.
  3.  Security teams must have the technologies to support effective incident response and proper categorization of incidents.

There are several ways that the IncMan platform supports the three best practices:

First, IncMan provides a platform to act as the foundation for an incident management program. It provides customizable incident forms allowing for complete tailoring to an organization and the details it must collect in support of its unique reporting requirements. Custom fields specific to distinct incident types allow for detailed data collection and categorization. These custom fields can be coupled with common attributes to track specific data, thereby providing a high level of flexibility for security teams in maintaining absolute reporting consistency across the team’s individual members.

Next, playbooks can be associated with specific incident types, providing step-by-step instructions for specialized incident response activities. Playbooks enforce consistency and can further reinforce reporting requirements. However, playbooks are not completely static, and while they certainly provide structure, IncMan’s playbooks also offer the ability to improvise, add, remove or substitute actions on the fly.

The platform’s Knowledge Base offers a repository for reference material to further supplement playbook instructions. Information collection requirements defined within playbook steps can be linked to Knowledge Base references, arming analysts with added information, for example with standard operating procedures pertaining to individual enterprise security tools, or checklists for applicable industry reporting requirements.

IncMan also includes Automated Responder Knowledge (ARK), a machine learning driven approach that learns from past incidents and the response to them, to suggest suitable playbooks for new or related incident types. This is not only useful for helping to identify specific campaigns and otherwise connected incident activity but can also highlight historical cases that can serve as examples for new or novice analysts.

Finally, the platform’s API and KPI export capabilities enable the extraction of raw incident data, allowing for data mining of valuable reporting information using external analytics tools. This information can then be used to paint a much clearer picture of an enterprise’s security posture and allow for fully-informed strategic decision-making.

Collectively, the IncMan features detailed above empower an organization with the means to support consistency in incident categorization, response, and reporting. For more information, please visit us at https://www.dflabs.com