English for Reporting Incidents and Outages

English for Reporting Incidents and Outages

Why Understanding Incident and Outage Reporting Matters in Today's Connected World

In our increasingly digital landscape, the ability to communicate technical problems clearly and efficiently can mean the difference between a minor disruption and a catastrophic business failure. When systems fail, networks crash, or services become unavailable, the clock starts ticking immediately. Every second of unclear communication compounds the problem, extends downtime, and multiplies costs. Technical professionals across industries face the same challenge: how to convey complex technical situations to diverse audiences while under pressure. The stakes couldn't be higher—customer trust, revenue streams, and organizational reputation all hang in the balance during these critical moments.

Incident and outage reporting refers to the structured process of documenting, communicating, and tracking technical failures, service disruptions, or system anomalies that impact normal operations. This specialized form of technical communication requires precision, clarity, and adaptability—you must speak the language of engineers, management, customers, and sometimes regulators, often simultaneously. This article explores the multifaceted nature of incident reporting from various professional perspectives, examining not just the technical vocabulary but also the communication strategies that separate effective responders from those who struggle during crises.

Throughout this comprehensive guide, you'll discover the essential terminology used across IT operations, telecommunications, and service management. You'll learn how to structure incident reports for maximum clarity, understand the critical difference between various severity levels, and master the art of communicating technical information to non-technical stakeholders. We'll examine real-world scenarios, explore best practices from multiple industries, and provide you with practical frameworks that you can implement immediately. Whether you're a help desk technician, network administrator, DevOps engineer, or IT manager, this resource will enhance your ability to navigate the challenging waters of incident communication with confidence and professionalism.

Core Terminology for Incident Classification and Severity

The foundation of effective incident reporting begins with understanding how to categorize and classify problems accurately. Organizations worldwide use standardized terminology to ensure everyone speaks the same language when disasters strike. An incident represents any unplanned interruption to service or reduction in quality, while an outage specifically refers to a complete loss of service availability. The distinction matters because it immediately signals the scope and urgency of the situation to all stakeholders.

Severity levels form the backbone of incident prioritization, and most organizations adopt a tiered system that ranges from critical to low impact. A critical incident or P1 (Priority 1) event represents a complete service failure affecting all users or critical business functions. These situations demand immediate response, often triggering emergency protocols and executive notification. One tier down, major incidents or P2 events cause significant disruption but may affect only a subset of users or provide degraded rather than absent service. Understanding these classifications helps you communicate urgency without creating unnecessary panic.

"The first words you use to describe an incident set the tone for the entire response. Choose them carefully, because you cannot take them back."

When documenting incidents, you'll frequently encounter terms that describe the scope and impact. Service degradation indicates that a system remains operational but performs below acceptable standards—perhaps with slower response times or reduced capacity. Intermittent issues come and go unpredictably, making them particularly challenging to diagnose and report. Widespread outages affect multiple systems, locations, or customer segments simultaneously, while localized incidents impact only specific areas or user groups. Each descriptor provides critical context that shapes the response strategy.

Status Indicators and Communication States

During active incidents, status communication becomes paramount. The term investigating signals that teams have acknowledged the problem and begun diagnostic work but haven't yet identified the root cause. When you report that you're monitoring a situation, you're indicating that a fix has been implemented but the team continues to observe system behavior to ensure stability. Resolved means the incident has been closed and service restored, while identified indicates the root cause is known even if the fix isn't yet complete.

Status Term Meaning Typical Duration Stakeholder Expectation
Detected Issue identified, initial assessment underway 5-15 minutes Acknowledgment and initial timeline
Investigating Active diagnosis, gathering data and logs 15-60 minutes Regular updates every 30 minutes
Identified Root cause determined, solution being implemented 30-120 minutes Clear timeline for resolution
Monitoring Fix deployed, observing for stability 1-4 hours Confirmation that service is recovering
Resolved Incident closed, normal operations restored N/A Post-incident report and prevention measures

Essential Phrases for Initial Incident Notification

The moment you detect a problem, your choice of words shapes the entire response trajectory. Opening statements must balance urgency with accuracy, providing enough information to trigger appropriate action without causing confusion. When raising an alert, consider phrases like "We are currently experiencing an outage affecting [specific service]" or "Our monitoring systems have detected abnormal behavior in [system name]." These constructions immediately establish what's wrong and which systems are involved without speculation about causes or timelines you cannot yet confirm.

For situations requiring immediate escalation, you might state: "This is a critical priority incident affecting all users across [region/platform]." The directness leaves no room for misinterpretation about severity. When the scope remains unclear during initial reporting, honest uncertainty serves you better than false precision: "We are investigating reports of service disruption and will provide updates as we gather more information." This approach manages expectations while buying your team the time needed for proper assessment.

Time-Sensitive Communication Structures

Immediate notification format: "Alert: [Service Name] experiencing [issue type]. Impact: [who/what affected]. Status: [current action]. Next update: [timeframe]."

Escalation trigger phrases: "Escalating to senior engineering team," "Activating incident response protocol," "Engaging vendor support at highest priority level"

Scope clarification statements: "Currently affecting approximately [number/percentage] of users," "Limited to [specific geography/platform/service tier]," "Impacting [list specific features or functions]"

Customer-facing announcements: "We are aware of an issue preventing access to [service] and are working urgently to restore normal operations"

Internal team coordination: "All hands on deck for P1 incident," "Need immediate assistance from [team name]," "Requesting emergency change approval for [action]"

"Never announce a timeline you cannot meet. Under-promise and over-deliver is not just good customer service—it's survival during incidents."

Technical Vocabulary for Describing System Failures

Precision in technical language prevents misunderstandings that can derail troubleshooting efforts. When servers fail, you might report a server crash, service unavailability, or unresponsive host—each term carries slightly different implications for diagnosis. A crash suggests a sudden failure requiring restart, while unresponsiveness might indicate network issues or resource exhaustion. Similarly, database connectivity issues differ from database corruption or query performance degradation, and using the correct term immediately points investigators in the right direction.

Network problems require their own specialized vocabulary. Packet loss describes data that fails to reach its destination, while latency spikes indicate delays in transmission even when data eventually arrives. DNS resolution failures prevent systems from translating domain names into IP addresses, and routing issues mean traffic cannot find proper paths through network infrastructure. When reporting these problems, specificity matters: "experiencing 40% packet loss on the primary link" provides actionable information, while "network is slow" does not.

Application and Service Layer Terminology

At the application level, you'll need to distinguish between various failure modes. An application hang means the software stops responding but hasn't crashed completely, while an application crash indicates complete failure requiring restart. Memory leaks describe situations where applications gradually consume all available RAM, eventually causing system instability. Deadlocks occur when processes wait indefinitely for resources held by each other, creating a standstill that requires intervention.

Performance issues require careful description to avoid ambiguity. Response time degradation means operations complete successfully but take longer than acceptable. Timeout errors indicate operations that exceed maximum wait times and fail as a result. Capacity constraints describe situations where systems cannot handle current load levels, while resource exhaustion means specific components—CPU, memory, disk space, or network bandwidth—have reached their limits. Each term guides troubleshooting toward different potential solutions.

Failure Type Technical Description User-Facing Impact Typical Diagnostic Approach
Service Timeout Requests exceed maximum wait time threshold Error messages, failed transactions Check backend response times, database queries, network latency
HTTP 500 Errors Internal server errors during request processing Generic error pages, incomplete actions Review application logs, check for code exceptions, verify dependencies
Connection Refused Target service not accepting new connections Cannot access service, "unable to connect" messages Verify service is running, check firewall rules, confirm port availability
Certificate Errors SSL/TLS certificate invalid, expired, or misconfigured Security warnings, blocked access Check certificate expiration, validate chain of trust, review configuration
Rate Limiting Request volume exceeds configured thresholds Intermittent failures, "too many requests" errors Review traffic patterns, adjust limits, identify potential abuse

Communication Strategies for Different Stakeholder Groups

The same incident requires dramatically different communication approaches depending on your audience. Technical teams need detailed, jargon-rich information that enables rapid diagnosis and resolution. When briefing engineers, include specifics: "The load balancer is showing 503 errors for approximately 35% of requests to the authentication service, with error rates spiking since 14:23 UTC. Primary symptoms include connection timeouts after 30 seconds and intermittent successful authentications." This level of detail empowers technical responders to begin troubleshooting immediately without wasting time gathering basic information.

Executive stakeholders require a completely different approach. They need to understand business impact, customer exposure, and potential financial or reputational consequences without drowning in technical minutiae. An effective executive update might state: "We're experiencing a service disruption affecting approximately 20,000 customers in the Northeast region. The issue prevents users from completing purchases, with estimated revenue impact of $15,000 per hour. Our engineering team has identified the root cause and expects full restoration within 90 minutes." Notice how this version focuses on business metrics rather than technical details.

"Technical accuracy without appropriate context is just noise. Tailor every message to what your audience needs to know, not everything you know."

Customer Communication Best Practices

External customer communication demands the most careful crafting. Customers need acknowledgment, transparency about impact, and realistic expectations about resolution without technical explanations that might confuse or concern them unnecessarily. Effective customer messages follow a simple structure: acknowledge the problem, explain what's affected, communicate what you're doing about it, and set expectations for updates. For example: "We're aware that some customers are currently unable to access their account dashboards. Our team is actively working to resolve this issue. We'll provide an update within the next 30 minutes and expect to have service fully restored within two hours."

Avoid certain pitfalls in customer communication that can damage trust or create legal exposure. Never blame specific vendors or partners publicly, even if they caused the problem. Don't provide highly specific technical details that might reveal security vulnerabilities. Resist the temptation to minimize the issue—if customers are experiencing problems, validate their experience rather than suggesting it's minor. And never, ever promise a specific resolution time unless you're absolutely certain you can deliver.

Structured Incident Report Components

Professional incident reports follow a consistent structure that ensures no critical information gets overlooked. The incident summary provides a concise overview in 2-3 sentences, capturing what happened, when it occurred, and what was affected. This section appears first because many readers—especially executives—may read nothing else. Following the summary, the timeline of events documents key moments from detection through resolution, creating an objective record of the incident's progression. Timestamps must be precise and include time zones to avoid confusion across distributed teams.

The root cause analysis section explains why the incident occurred, distinguishing between the immediate trigger and underlying systemic issues. This analysis requires technical depth while remaining accessible to non-expert readers. A well-written root cause description might state: "The outage was triggered when a routine database maintenance script ran during peak traffic hours, consuming excessive CPU resources. However, the underlying cause was insufficient capacity planning that left no buffer for maintenance operations during high-load periods." This explanation identifies both the trigger and the deeper problem requiring attention.

Impact Assessment and Metrics

📊 Duration metrics: Time from initial detection to full resolution, including any periods of partial restoration

📊 User impact: Number and percentage of affected users, broken down by geography, customer tier, or other relevant segments

📊 Service availability: Percentage uptime during the incident window, often expressed as "three nines" (99.9%) or similar

📊 Business impact: Revenue loss, transaction failures, customer complaints, or other quantifiable business consequences

📊 System metrics: Error rates, response times, resource utilization, and other technical measurements during the incident

The resolution steps section documents actions taken to restore service, providing a roadmap that helps with similar future incidents. Each step should be specific enough that another technician could reproduce the actions if necessary. Finally, the preventive measures section outlines changes being implemented to prevent recurrence—whether technical improvements, process changes, or additional monitoring. This forward-looking component transforms the incident from a failure into a learning opportunity.

"The quality of your incident report directly correlates with your organization's ability to prevent the same problem from happening again."

Real-Time Update Cadences and Expectations

During active incidents, communication frequency becomes as important as content. Stakeholders experiencing service disruption need regular updates even when nothing has changed, because silence creates anxiety and speculation. For critical incidents, commit to updates every 15-30 minutes regardless of progress. These updates might simply confirm that investigation continues and provide a revised timeline, but they demonstrate active engagement and prevent stakeholders from assuming the worst. As incidents progress toward resolution, you can extend update intervals to every 60 minutes for major incidents or every 2-4 hours for minor issues.

Update content should follow a consistent format that stakeholders can quickly scan for essential information. Begin each update with the current status using standardized terminology. Follow with any new information discovered since the last update, even if it's simply confirmation that previous information remains accurate. Provide a clear statement of next steps and when stakeholders should expect the following update. Close with contact information for questions or escalations. This structure creates predictability that helps anxious stakeholders manage their own responses to the incident.

Escalation Triggers and Notification Protocols

Knowing when and how to escalate incidents separates experienced professionals from novices. Escalation triggers typically include time-based criteria—if an incident isn't resolved within specific timeframes based on severity—and impact-based criteria, such as when problems affect more users than initially estimated or when business-critical functions remain unavailable. Clear escalation paths prevent delays during critical moments. Your incident report should document when escalations occurred and why, creating accountability and helping refine escalation criteria over time.

Notification protocols specify who receives alerts at each escalation level and through which channels. Primary on-call engineers receive initial notifications via paging systems that ensure they respond even outside business hours. Secondary escalations might add team leads or managers to the communication chain. Executive escalations—reserved for the most severe incidents—bring senior leadership into the response. Document these protocols clearly in your incident management procedures, and test them regularly to ensure contact information remains current and notification systems function reliably.

Post-Incident Review and Documentation

Once service restoration is complete, the real learning begins. The post-incident review or postmortem examines what happened, why it happened, how the team responded, and what needs to change. Effective postmortems adopt a blameless culture that focuses on systemic improvements rather than individual fault-finding. The goal is organizational learning, not punishment. When conducting these reviews, gather all stakeholders involved in the incident response, walk through the timeline collaboratively, and identify specific action items with assigned owners and deadlines.

The written postmortem document serves multiple purposes: it creates institutional memory, satisfies compliance requirements, demonstrates accountability to customers and executives, and provides a foundation for process improvements. Structure your postmortem around key questions: What was the customer impact? What was the root cause? Why did our systems not detect or prevent this issue? How did we respond, and what could we have done better? What are we changing to prevent recurrence? Each question drives toward actionable insights rather than mere description of events.

"The incidents you learn from become your competitive advantage. The incidents you repeat become your reputation."

Action Items and Follow-Through

Postmortems generate action items across multiple categories. Technical remediation includes code fixes, infrastructure improvements, or architectural changes that address root causes. Monitoring enhancements add visibility into blind spots revealed by the incident. Process improvements update runbooks, escalation procedures, or communication templates based on lessons learned. Training initiatives address skill gaps or knowledge deficits exposed during the response. Each action item requires an owner, a deadline, and a mechanism for tracking completion—without accountability, even the best intentions fade.

Measuring the effectiveness of post-incident improvements requires tracking specific metrics over time. Monitor the recurrence rate of similar incidents to verify that preventive measures actually work. Track mean time to detection (MTTD) and mean time to resolution (MTTR) to assess whether process improvements accelerate response. Survey incident responders about the clarity of procedures and the adequacy of tools and resources. These measurements transform vague commitments to "do better" into concrete evidence of organizational learning.

Industry-Specific Terminology and Contexts

Different industries apply specialized terminology to incident reporting that reflects their unique operational contexts. In telecommunications, you'll encounter terms like call drop rate, network congestion, and signal degradation that describe service quality issues specific to voice and data networks. Financial services organizations discuss transaction processing failures, settlement delays, and regulatory reporting impacts because their incidents carry compliance and monetary consequences beyond simple service availability. Healthcare IT professionals must consider patient safety implications, HIPAA breach notifications, and clinical workflow disruptions that can literally be matters of life and death.

Cloud service providers have developed particularly sophisticated incident taxonomies because they serve diverse customers with varying needs. Terms like availability zone failure, region-wide outage, and service degradation in specific availability zones communicate precise geographic and architectural scope. Control plane issues affect the ability to manage resources, while data plane problems impact actual workload execution. Understanding these distinctions helps you communicate effectively whether you're a cloud provider, a customer of cloud services, or both.

Regulatory and Compliance Considerations

Certain industries face regulatory requirements that shape incident reporting obligations. Financial institutions must report significant operational disruptions to regulators within specified timeframes, often measured in hours. Healthcare organizations must evaluate whether incidents constitute breaches of protected health information requiring patient notification. Telecommunications providers face obligations to report outages affecting emergency services like 911. These regulatory contexts add legal dimensions to incident communication, requiring careful language that satisfies compliance obligations without creating unnecessary legal exposure.

When incidents involve potential security breaches or data exposure, terminology becomes particularly sensitive. Terms like unauthorized access, data exfiltration, and security incident trigger specific legal and regulatory processes. Many organizations maintain separate procedures for security incidents versus operational outages because the response requirements, notification obligations, and stakeholder groups differ significantly. Your incident classification systems should clearly distinguish between security and availability incidents to ensure appropriate protocols activate automatically.

Tools and Technologies for Incident Management

Modern incident management relies on specialized tools that streamline communication and coordination during high-pressure situations. Incident management platforms like PagerDuty, Opsgenie, or VictorOps centralize alerting, escalation, and communication workflows. These systems integrate with monitoring tools to automatically create incidents when problems are detected, route notifications to appropriate responders based on schedules and escalation policies, and provide a centralized hub for all incident-related communication. Familiarity with these platforms and their terminology—like incident acknowledgment, escalation policies, and on-call rotations—is essential for modern technical professionals.

Status page services such as Statuspage.io or Sorry™ enable transparent external communication during incidents. These platforms allow you to post updates that customers can subscribe to, reducing the volume of support inquiries during outages. When using status pages, maintain consistency between internal and external messaging—customers quickly notice if your public statements contradict information they're hearing through other channels. Status pages also create a public record of your reliability and transparency, so treat them with appropriate care and professionalism.

Collaboration and Communication Channels

Real-time collaboration tools like Slack, Microsoft Teams, or dedicated incident response channels facilitate coordination among distributed teams. Many organizations create specific incident channels or war rooms for each major incident, providing a focused space for technical discussion separate from general team communications. These channels become valuable historical records when conducting post-incident reviews, capturing the real-time thinking and decision-making that led to resolution. Establish clear conventions for these channels—like using threads for side discussions or pinning key updates—to prevent information overload during high-stress situations.

Video conferencing becomes essential during complex incidents requiring real-time coordination across multiple teams or locations. Platforms like Zoom, Google Meet, or Microsoft Teams enable screen sharing for collaborative troubleshooting, face-to-face communication that reduces misunderstandings, and recording capabilities that support post-incident analysis. However, don't let tool proliferation create confusion—establish clear protocols about which tools to use for which purposes, and ensure all team members have access and familiarity before incidents occur.

"Your incident response is only as good as your weakest communication channel. Test your tools before you need them."

Cultural and Linguistic Considerations in Global Operations

Organizations operating across multiple countries and time zones face additional complexity in incident communication. Language barriers can delay understanding and response, particularly when incidents occur during off-hours requiring coordination between teams in different regions. Establish common operating languages for incident response—typically English in international contexts—while recognizing that non-native speakers may need additional time to process complex technical information under pressure. Consider providing incident templates and common phrases in multiple languages to support rapid communication regardless of who's on call.

Cultural differences influence communication styles in ways that matter during incidents. Some cultures value direct, explicit communication, while others prefer more contextual, indirect approaches. High-context cultures may assume shared understanding that doesn't actually exist, while low-context cultures might provide excessive detail that obscures key points. Build awareness of these differences into your incident response training, and encourage team members to adapt their communication styles to their audiences. When in doubt, err on the side of clarity and explicitness—incidents are not the time for subtle communication.

Time Zone Coordination and Handoffs

Managing incidents across time zones requires careful attention to temporal references and coordination points. Always specify time zones when communicating timestamps—"14:00" means nothing without knowing whether it's UTC, EST, or JST. Many global operations teams adopt UTC as their standard reference time to eliminate ambiguity. When incidents span multiple shifts or follow-the-sun support models, formal handoff procedures ensure continuity as responsibility transfers between teams. These handoffs should include current status, actions taken, pending tasks, and any context that might not be obvious from written documentation alone.

Schedule overlaps between time zones create valuable windows for real-time collaboration. When possible, time major changes or risky operations during periods when multiple regions have coverage, providing deeper bench strength if problems arise. Document these optimal windows in your change management procedures, and use them strategically for activities with higher incident risk. This proactive scheduling reduces the likelihood of incidents occurring when only skeleton crews are available to respond.

The landscape of incident management continues evolving as technology and organizational practices advance. AIOps (Artificial Intelligence for IT Operations) platforms increasingly augment human responders by automatically correlating events, identifying patterns, and suggesting remediation actions. These systems don't replace human judgment but enhance it by processing vast amounts of data faster than any person could. As these tools mature, incident reporting will increasingly include AI-generated insights alongside human analysis, requiring new skills in interpreting and validating algorithmic recommendations.

The shift toward chaos engineering and proactive failure testing changes how organizations think about incidents. Rather than waiting for problems to occur naturally, teams deliberately introduce failures in controlled ways to verify that systems respond appropriately and that teams can execute incident response procedures effectively. This practice normalizes incidents as learning opportunities rather than failures, gradually building organizational resilience. Incident reports from chaos engineering exercises look similar to those from unplanned outages but carry different emotional valence—they're evidence of maturity rather than markers of problems.

Automation and Self-Healing Systems

Increasing automation reduces human involvement in routine incident response, allowing people to focus on complex situations requiring judgment and creativity. Self-healing systems automatically detect and remediate common problems without human intervention—restarting failed services, scaling resources to meet demand, or failing over to backup systems. As these capabilities expand, the nature of incident reporting shifts toward oversight and validation rather than hands-on troubleshooting. You'll need to report on what automated systems did, verify that their actions were appropriate, and identify situations where automation failed or proved insufficient.

This automation trend doesn't eliminate the need for strong communication skills—it actually elevates their importance. As routine problems get handled automatically, the incidents that reach human responders become increasingly complex and ambiguous. These situations demand even more sophisticated communication to coordinate across specialized teams, explain nuanced technical issues to diverse stakeholders, and build consensus around appropriate responses when standard procedures don't apply. The future belongs to technical professionals who combine deep expertise with exceptional communication abilities.

Frequently Asked Questions
What is the difference between an incident and an outage?

An incident is any unplanned interruption or reduction in quality of service, which might include performance degradation, partial functionality loss, or intermittent issues. An outage specifically refers to complete unavailability of a service—users cannot access it at all. All outages are incidents, but not all incidents are outages. This distinction matters for severity classification and response prioritization.

How quickly should I send the first communication after detecting an incident?

For critical incidents, aim to send initial notification within 5-10 minutes of confirmed detection. This first message can be brief, simply acknowledging the problem and confirming that investigation is underway. For major incidents, 15-20 minutes is acceptable. The key is balancing speed with accuracy—don't delay while gathering perfect information, but verify the basics before announcing. Set expectations for when the next update will arrive.

Who should receive incident notifications?

Notification distribution depends on incident severity and impact. Technical responders always receive alerts for incidents affecting their systems. For critical incidents, expand notifications to include management, customer support teams, and potentially executives. Customer-facing communications go to affected users through status pages, email, or in-app notifications. Establish tiered notification protocols in advance so you're not making these decisions under pressure during active incidents.

What should I do if I don't know the cause of an incident yet?

Communicate what you do know: which services are affected, what symptoms users are experiencing, and that investigation is actively underway. It's perfectly acceptable to say "We are investigating the root cause and will provide updates as we learn more." Never speculate about causes before confirming them, as incorrect early statements can misdirect troubleshooting efforts and damage credibility if you later need to correct them.

How detailed should customer-facing incident communications be?

Customer communications should focus on impact and resolution rather than technical details. Explain what isn't working, approximately how many customers are affected, what you're doing to fix it, and when they can expect updates or resolution. Avoid technical jargon, internal system names, or details that might confuse or concern customers unnecessarily. Save technical depth for internal reports and post-incident reviews.

What is a post-incident review and why is it important?

A post-incident review (also called a postmortem) is a structured analysis conducted after an incident is resolved. It examines what happened, why it happened, how the team responded, and what should change to prevent recurrence. These reviews are crucial for organizational learning, turning incidents from pure negatives into opportunities for improvement. Effective reviews adopt a blameless approach focused on systemic issues rather than individual fault.

How long should I continue monitoring after an incident is resolved?

Continue enhanced monitoring for at least 24-48 hours after resolution for major incidents, and potentially longer for critical outages. Many problems recur shortly after initial fixes as systems return to normal load or as temporary workarounds prove insufficient. This monitoring period also helps verify that your resolution addressed root causes rather than just symptoms. Document the monitoring period in your incident report and communicate clearly when you're transitioning from active incident response to normal operations.

What metrics should I track for incident management effectiveness?

Key metrics include Mean Time to Detect (MTTD), Mean Time to Acknowledge (MTTA), Mean Time to Resolve (MTTR), incident frequency, recurrence rates, and customer impact measurements. Track these metrics over time to identify trends and measure the effectiveness of improvements. Also monitor qualitative factors like stakeholder satisfaction with communications and team confidence in incident procedures. Balanced scorecards combining technical and communication metrics provide the most complete picture of incident management maturity.