Learn

August 31, 2023

17 Minute Read

Incident Management: The Complete Guide

By Laiba Siddiqui

Incident response (IR) is the set of strategic and organized actions an organization takes in the immediate aftermath of a cyberattack or security breach. Incident response actions have the ultimate goal of reducing the risk risk of future incidents. As such, incident response plans aim to:

Swiftly identify the attack or incident
Mitigate its impact
Contain the damage.
Address the root cause.

IR involves planning, preparation, detection, containment, recovery, and remediation efforts to safeguard your organization's digital assets and minimize the adverse consequences of cybersecurity incidents.

Splunk ITSI is an Industry Leader in AIOps

Splunk IT Service Intelligence (ITSI) is an AIOps, analytics and IT management solution that helps teams predict incidents before they impact customers.

Using AI and machine learning, ITSI correlates data collected from monitoring sources and delivers a single live view of relevant IT and business services, reducing alert noise and proactively preventing outages.

Learn more about Splunk ITSI ›

Swiftly identify the attack or incident
Mitigate its impact
Contain the damage.
Address the root cause.

What are "security incidents"?

In the realm of cybersecurity, various incidents can pose threats to an organization's network, potentially leading to unauthorized intrusions: people are getting into your network, and they should not be there. These incidents vary in their methods, intentions, and potential consequences, and they demand diligent vigilance and robust security measures.

Understanding and preparing for these types of security incidents is crucial for organizations seeking to protect their digital assets and maintain the security and integrity of their networks. It's important to implement robust security measures, conduct regular risk assessments, and have a well-defined incident response plan to mitigate the impact of these incidents.

Some of the common types of cybersecurity incidents (and security breaches) include:

Unauthorized attempts to access systems or data

Unauthorized access incidents occur when an individual or a group attempts to infiltrate an organization's systems or access its data without permission. Examples include hacking attempts, where attackers employ various techniques to breach defenses, brute force attacks, which involve trying numerous combinations of passwords to gain entry, and social engineering, a manipulation tactic aimed at tricking individuals into revealing sensitive information.

Privilege escalation attack

Privilege escalation incidents involve an attacker gaining access to a system with limited permissions and then exploiting vulnerabilities or utilizing stolen credentials to acquire higher-level privileges. This can result in unauthorized access to critical resources and data, posing a significant risk to an organization's security.

Insider threat

Insider threat incidents occur when a current or former employee, contractor, or someone with access privileges within an organization misuses their access for malicious purposes. Examples of insider threats include stealing sensitive information, intentionally damaging systems, or engaging in acts of sabotage that can have severe consequences.

Phishing attack

Phishing incidents involve attackers sending deceptive emails or messages that appear to originate from legitimate sources but are, in reality, clever traps.

The primary objective of phishing is to deceive recipients into divulging sensitive information or to spread malware through malicious attachments or links.

Malware attack

Malware incidents involve the use of malicious software, such as viruses or Trojan horses, to compromise an organization's systems or data.

Different types of malware serve various purposes, from gaining unauthorized access to systems to disrupting normal operations. For instance, ransomware encrypts data and demands a ransom for its release.

Denial-of-Service (DoS) attack

A DoS incident occurs when an attacker floods a system or network with excessive traffic, rendering it unavailable to legitimate users.

The intention is to disrupt operations and services, causing inconvenience or financial harm to the organization.

Man-in-the-Middle (MitM) attack

In a MitM incident, an attacker intercepts and potentially alters the communication between two parties without their knowledge.

Attackers can steal sensitive information or inject malicious content into the communication, compromising the confidentiality and integrity of data.

Advanced Persistent Threat (APT)

APTs represent sophisticated and targeted attacks designed to gain access to an organization's systems or data. These attacks are often orchestrated with the intention of stealing sensitive information or maintaining a long-term presence within the network, making them particularly challenging to detect and counter.

Ransomware

Ransomware is a type of malicious software (malware) designed to encrypt a victim's files or lock them out of their computer system until a ransom is paid to the attacker. The ransom is typically demanded in cryptocurrency, such as Bitcoin, which provides a level of anonymity to the cybercriminals. Ransomware attacks are a significant cybersecurity threat, and they can have devastating consequences for individuals, businesses, and organizations.

(Related reading: beware these latest trends in ransomware.)

SANS 6 Steps of an Incident Response Plan

The SANS Institute, a renowned organization in the field of cybersecurity, has outlined a comprehensive six-phase incident response lifecycle, which provides a structured approach for handling cybersecurity incidents. These phases are designed to be repeated for each incident that occurs to continually improve an organization's incident response capabilities – and their overall security posture and readiness to respond to future threats.

Here's an in-depth explanation of each phase:

Step 1. Preparation

In the preparation phase, the organization reviews its existing security measures, policies, and procedures to assess their effectiveness. This typically involves conducting a risk assessment to identify vulnerabilities and prioritize critical assets.

The findings from the risk assessment inform the development or refinement of incident response plans, including communication plans and the assignment of roles and responsibilities for the incident response team.

This phase is about enhancing the organization's readiness to respond to incidents and ensuring that high-priority assets are adequately protected.

Step 2. Identification of Incidents

During this phase, security teams use the tools and procedures established in the preparation phase to detect and identify suspicious or malicious activity within the organization's network and systems.

When an incident is detected, the response team works to understand:

The nature of the attack
Its source
The attacker's objectives

This phase also involves protecting and preserving any evidence related to the incident for further analysis and potential legal action. Communication plans are initiated to inform stakeholders, authorities, legal counsel, and users about the incident.

Step 3. Containment of Attackers and Incident Activity

Once an incident is confirmed, the focus shifts to containment, with the goal of limiting the damage caused by the attack. Quick containment minimizes the attacker's ability to cause further harm.

Containment is usually carried out in two phases:

Short-term containment isolates immediate threats.
Long-term containment applies additional access controls to unaffected systems.

For example, this may involve segmenting off the compromised network area or taking infected servers offline while rerouting traffic to failover systems.

Step 4. Eradication of Attackers and Re-entry Options

In this phase, the incident response team gains a comprehensive understanding of the extent of the attack and identifies all affected systems and resources. The focus is on ejecting attackers from the network and eliminating malware from compromised systems. This phase continues until all traces of the attack are removed.

Depending on the severity of the incident, some systems may need to be taken offline and replaced with clean, patched versions during the recovery phase.

Step 5. Recovery from Incidents, Including Restoration of Systems

During the recovery phase, the incident response team brings updated or replacement systems online. The goal is to return systems to normal operation. Ideally, data and systems can be restored without data loss, but in some cases, it may be necessary to recover from the last clean backup.

The recovery phase also includes monitoring systems to ensure that attackers do not return or re-exploit vulnerabilities.

Step 6. Lessons Learned and Application of Feedback to the Next Round of Preparation

The final phase involves a comprehensive review of the incident response process. Team members evaluate what worked well, what didn't, and identify areas for improvement.

Lessons learned, along with feedback and suggestions, are documented to inform the next round of preparation. Any incomplete documentation is wrapped up during this phase.This phase is essential for continuous improvement in incident response capabilities.

NIST 4 Phases Incident Response

In addition to the SANS 6 steps, the NIST 4 phases are a common approach to incident response. The NIST incident response cycle consists of four key phases, each with specific goals and roles in the incident response process:

Phase 1. Preparation

The preparation phase focuses on getting the organization ready to respond to cybersecurity incidents effectively. It includes establishing an incident response policy, team, and communication plan, as well as implementing preventative measures to reduce the risk of incidents.

In this phase, the organization assesses its risk environment, applies security best practices to systems and networks, secures the network perimeter, deploys anti-malware tools, and provides training to users. It involves creating an environment where the incident response team can quickly mobilize and coordinate their efforts when needed.

Phase 2. Detection and Analysis

This phase involves identifying the type of threat an organization is facing and determining whether it constitutes an incident. It includes detecting and analyzing signs of potential incidents.

During detection and analysis, the organization looks for precursors (indicators of future incidents) and indicators (evidence that an incident may be occurring or has already occurred). Techniques such as log analysis, monitoring, and synchronization of system clocks are used to identify anomalies. Incidents are documented and prioritized, and this information is then used to respond effectively.

Phase 3. Containment, eradication & recovery

The bulk of active incident response takes place in this phase. The primary objectives are to contain the threat, eradicate it, and recover affected systems to resume normal operations.

Containment strategies are defined based on the type of attack and the potential damage. Incident response teams work to:

Isolate the threat.
Identify the attacking host.
Gather evidence.
Understand the attack’s behavior.

Eradication involves removing malware and compromised accounts.

The recovery phase focuses on restoring systems from clean backups, implementing security patches, and improving defenses.

Phase 4. Post-incident activity

This often-overlooked phase is crucial for learning from the incident and improving future incident response efforts. It includes conducting a "Lessons Learned" meeting, preserving data and evidence, and revisiting preparation for future cybersecurity threats.

In the post-incident phase, the organization conducts a thorough review of the incident, documenting key findings and strategies for improvement. Data collected during the incident is preserved, and the incident response team assesses its performance against established baselines and metrics. The findings and lessons learned can inform future incident response and prevention efforts. Additionally, organizations are encouraged to share their insights with other entities to enhance collective cybersecurity knowledge.

(Check out our full guide: how to conduct incident reviews & postmortems.)

Incident response solutions & technologies

Commonly used incident response technologies encompass a range of tools and solutions that play crucial roles in identifying, analyzing, and mitigating security incidents. Some of these technologies include:

SIEM (Security Information and Event Management)

SIEM systems serve as centralized platforms for aggregating and correlating security event data from various internal security tools, including firewalls, vulnerability scanners, and threat intelligence feeds.

SIEM helps incident response teams sift through the vast volume of notifications generated by these tools, enabling them to focus on indicators of actual threats and reduce 'alert fatigue.'

SOAR (Security Orchestration, Automation, and Response)

SOAR technology empowers security teams to define playbooks, which are structured workflows that coordinate different security operations and tools in response to security incidents. It also facilitates the automation of specific tasks within these workflows, improving efficiency in incident response.

(Learn more: SIEM vs SOAR: What’s The Difference?)

EDR (Endpoint Detection and Response)

EDR software is designed to provide automatic protection for an organization's end users, endpoint devices, and IT assets against cyberthreats that can bypass traditional antivirus software and other endpoint security tools. EDR continuously collects data from all network endpoints, analyzing it in real time to detect known or suspected cyberthreats and respond automatically to prevent or minimize potential damage.

XDR (Extended Detection and Response)

XDR is a cybersecurity technology that unifies security tools, data sources, telemetry, and analytics across various parts of the hybrid IT environment, including endpoints, networks, and both private and public clouds.

XDR aims to create a centralized system for threat prevention, detection, and response, helping security teams and Security Operations Centers (SOCs) streamline their efforts by eliminating tool silos and automating responses throughout the entire cyberthreat kill chain.

(Learn more: EDR vs XDR vs MDR: What’s The Difference?)

UEBA (User and Entity Behavior Analytics)

UEBA leverages behavioral analytics, machine learning algorithms, and automation to identify abnormal and potentially hazardous user and device behavior. It is particularly effective at detecting insider threats, such as malicious insiders or hackers using compromised insider credentials. UEBA functionality is often integrated into SIEM, EDR, and XDR solutions, enhancing their capabilities in identifying and responding to security incidents.

ASM (Attack Surface Management)

ASM solutions automate the continuous process of discovering, analyzing, remediating, and monitoring vulnerabilities and potential attack vectors across an organization's entire attack surface. These solutions can uncover previously unmonitored network assets, establish relationships between assets, and provide essential insights to enhance overall security.

These incident response technologies play crucial roles in helping organizations bolster their cybersecurity efforts, detect and respond to threats more effectively, and manage their attack surface to reduce vulnerabilities and potential attack vectors.

Why organizations need strong incident response

Incident response is critically important for organizations for a variety of reasons:

Cybersecurity threats

Organizations face a constant and evolving threat from cyberattacks and security breaches. These threats can result in:

Data breaches
Financial losses
Damage to reputation
Legal or regulatory consequences.

Incident response helps organizations prepare for, respond to, and recover from these threats effectively.

Minimizing damage

The quicker an organization can respond to a cybersecurity incident, the less damage it is likely to suffer. Incident response aims to identify and mitigate the impact of incidents promptly, reducing potential financial losses and operational disruption.

Protecting data and assets

Incidents, if not managed effectively, can result in the loss or theft of sensitive data and intellectual property. Incident response measures help protect an organization's critical assets and ensure data confidentiality, integrity, and availability.

Reputation management

Public perception of an organization can be significantly impacted by how it responds to a cybersecurity incident:

A well-executed incident response can help maintain or even enhance an organization's reputation.
A poorly managed incident can lead to public distrust and reputational damage.

Legal and regulatory compliance

Many industries and jurisdictions have specific legal and regulatory requirements for incident reporting and handling. Non-compliance can lead to legal consequences, fines, and other penalties. Incident response helps organizations meet these obligations.

Operational continuity

Effective incident response can minimize disruptions to an organization's operations. By quickly identifying and containing threats, incident response helps maintain business continuity and ensures that daily operations continue as smoothly as possible.

Risk mitigation

Incident response planning includes risk assessments, helping organizations identify vulnerabilities and weaknesses. By understanding these risks, organizations can take proactive steps to prevent incidents and reduce their likelihood.

Continuous improvement

Incident response is an iterative process. Each incident provides an opportunity to learn and improve response strategies, making the organization more resilient and better prepared for future incidents.

Stakeholder trust

Customers, partners, investors, and other stakeholders expect organizations to safeguard their data and assets. Demonstrating a commitment to incident response and cybersecurity can build trust and confidence among these groups.

Regaining control

During an incident, confusion and panic can reign. Having a well-defined incident response plan provides a structured approach, enabling the organization to regain control, coordinate response efforts, and make informed decisions.

Locking it up

In summary, incident response is essential for organizations to protect themselves from the ever-present and evolving threats in the digital landscape. It helps organizations safeguard their data, minimize damage, maintain trust, and meet legal and regulatory obligations. A well-executed incident response strategy is a cornerstone of modern cybersecurity risk management.

Disruptive cybersecurity incidents become more and more commonplace each day. Even if nothing is directly hacked, these incidents can harm your systems and networks. Navigating cybersecurity incidents is a constant challenge — the best way to stay ahead of the game is with effective incident management.

This article will explore definitions, benefits, a 6-step process for incident management and much more — all so you can know good incident management when you see it, or improve incident management in your own organization. Let’s get started.

What is an incident?

Before diving into managing incidents, let’s get on the same page about what we consider an incident. NIST defines a cyber incident as:

"Actions taken through the use of an information system or network that result in an actual or potentially adverse effect on an information system, network, and/or the information residing therein."

Breaches are of course one type of incident. But it's important to remember that an incident doesn't mean a breach occurred — simply that some information is threatened. Here are a few examples of incidents in cybersecurity:

Data breaches
Reduced integrity of information systems
Unauthorized access to information systems
Unauthorized use of information systems or electronic communications networks

Types of incidents

Incidents are categorized into different severity levels based on their impact and urgency. Here's a general breakdown of 1-5 severity levels:

A critical incident: affects users in production.
A significant problem: affects limited users in production.
An incident: causes errors, minor issues, or a heavy system load.
A minor problem: affects the service but doesn't seriously impact users.
A low-level deficiency: causes minor problems.

Defining incident management

With that out of the way, let’s define what exactly incident management is all about.

Incident management is the process of identifying, managing, recording, and analyzing security threats and incidents related to cybersecurity in the real world. Doing so minimizes the impact of incidents on business operations and prevents them in the future.

It’s the key to any successful business — a dedicated incident handling team ready to implement an effective response plan as soon they encounter any incident.

(See how Splunk solutions support the entire incident management practice.)

Incident management vs. problem management

Incident management and problem management are two processes within IT service management (ITSM) that focus on two aspects:

Maintaining the existing IT services.
Improving the quality of services and minimizing disruptions to the business.

But there's a difference between both. Incident management focuses on restoring services to normal after disruptions. And problem management identifies and eliminates the root causes of incidents to prevent their recurrence. These processes work together to enhance the reliability and stability of IT services and minimize their impact on the business.

Benefits of incident management

Incident management helps to identify, manage, record, and analyze security threats and incidents related to cybersecurity in the real world. Here are some benefits of incident management:

Reduced downtime

You can minimize the downtime associated with cyberattacks, data breaches, or system failures by quickly identifying and resolving incidents. This will help maintain service quality, increase productivity, and ensure a better end-user experience.

Improved customer trust and satisfaction

If your organization follows an effective management process, it'll help protect its reputation, reduce the adverse effects of cyber destruction, and prevent data leaks — offering better customer trust and satisfaction.

Increased operational resilience

Incident management also helps organizations become more resilient against future incidents by identifying vulnerabilities and implementing measures to prevent similar situations from arising again.

Strengthened overall security posture

You can also detect, analyze, and respond to security incidents in a coordinated manner. And it will help you strengthen the overall security posture of the organization.

Better end-to-end visibility

You will also gain end-to-end visibility into the incident lifecycle, from detection to resolution. This can help organizations identify areas for improvement and optimize their incident response processes.

Best tips for efficient incident management

Here are some tips and best practices to manage sudden incidents within your organization:

Establish a clear process that outlines the steps to be taken if an incident occurs. This process should include the following elements:

Incident identification
Logging
Categorization
Prioritization
Investigation
Resolution and closure

Define the roles and responsibilities of the incident management team, including the incident manager, responders, and other stakeholders. This will help ensure everyone knows what is expected of them during an incident.

Use automation tools to streamline the entire procedure. Automation will reduce response times, improve accuracy, and save resources for more critical tasks. Some organizations opt for a managed detection and response system in order to minimize response times. Regularly train team members on emergent threats and how to handle incidents effectively — by doing so, they can quickly identify gaps in the process and improve response times.

Continuously monitor and improve the incident management process by analyzing incident data, identifying trends, and implementing changes to prevent similar incidents from occurring in the future.

The 6-step incident management process

Your organization can become more resilient against future incidents by implementing the right safety measures. Here's a 6-step process to approach incident management:

Step 1: Identify the incident

The first step is to detect the incident. In this, you've to identify abnormal or unexpected events that could disrupt normal operations within the organization. Your team can do this through various means, such as:

Monitoring tools
User reports
Automated alerts
System logs

Step 2: Log the incident

Once your team has identified an incident, start documenting each detail. To create a detailed record of the incident, you should include the following:

Its description
The time it was detected
Name of team members handling the incident
Initial assessment of its impact and severity on the organization

This record is a starting point for tracking progress and helps communicate between the incident response team and stakeholders.

Step 3: Categorize the incident

After logging the incident, you must categorize it based on the predefined criteria. It'll help your team understand the nature of the incident, its potential impact on the business and the resources required for its resolution.

There are different categories of incidents, and the most common ones are:

Hardware failures
Software glitches
Security breaches

Once you've categorized the incident, you will know how to allocate the appropriate teams and resources to address the incident.

Step 4: Prioritize the incident

Not all incidents have the same level of urgency or impact, so you should prioritize based on severity and potential consequences.

Prioritization ensures that the most critical incidents are addressed first—reducing the impact on business operations and minimizing downtime. It'll also guide your incident response team's actions.

Step 5: Respond to the incident

During this phase, you must develop and execute a well-defined plan to mitigate the incident's effects and restore normal operations. This can include:

Isolating affected systems
Investigating the root cause
Implementing temporary or permanent fixes
Communicating with stakeholders about the progress and resolution

Step 6: Closing the incident

After your team has addressed the incident and normal operations are restored, the incident is considered resolved, and the closure phase begins. This phase will involve the following activities :

Documenting the actions taken during the incident response
Verifying that the issue has been completely resolved
Updating incident records with relevant information
Evaluating the incident management process itself
Identifying areas for improvement
Capturing lessons learned for future incidents

(Perfect your incident review & postmortem process with these best practices.)

Roles in incident management

There are several roles and responsibilities necessary for an effective incident response. And here are some of the most common roles involved:

The incident commander manages the incident response process. They coordinate and direct all facets of the incident response, including communication, resource allocation, and decision-making.

The incident responder responds to the incident and takes appropriate actions to contain and resolve it — this includes investigating the incident, restoring services, and implementing temporary fixes.

The IT operator monitors and maintains the IT infrastructure and systems. They identify and report incidents, perform routine maintenance, and troubleshoot issues.

The incident manager manages significant incidents that impact the organization negatively, this includes coordinating the incident response team, communicating with stakeholders, and ensuring that incidents are resolved quickly.

Incident analysts analyze incident data and identify trends and patterns. They determine the root cause of incidents, develops incident response plans, and recommends improvements to the incident management process.

Manage incidents to secure your operations

Managing incidents is important because it helps determine and deal with cybersecurity problems that affect your business operations. Your team has to find, handle, keep track of, and study security risks and incidents related to cybersecurity.

See an error or have a suggestion? Please let us know by emailing [email protected].

This posting does not necessarily represent Splunk's position, strategies or opinion.

Laiba Siddiqui

Laiba Siddiqui is an SEO writer who loves simplifying complex topics. She has helped companies like Data World, DataCamp, and Rask AI create engaging and informative content for their audiences. You can connect with her on LinkedIn.

Learn 4 Min Read

Mean Time Between Failure (MTBF): What It Means & Why It’s Important

Learn about Mean Time Between Failures (MTBF), a key metric for system reliability, maintenance planning, and optimizing uptime in today's tech-driven world.

Learn 12 Min Read

What Is Cybersecurity? The Ultimate Guide

Learn about cybersecurity and its importance. Take a look at various kinds of cyber threats, risks, and tools and frameworks to mitigate them.

Learn 6 Min Read

Data Denormalization: The Complete Guide

To normalize or not normalize your data. Decide here, as we explain why you’d denormalize data (faster query times!) but also the reasons to avoid it.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk