Overview
This prompt guides expert incident report writers in creating detailed reports for outages. Stakeholders, including management and technical teams, will benefit from clear insights and actionable recommendations.
Prompt Overview
Purpose: This report aims to provide a comprehensive analysis of the recent incident, outlining its impact and corrective measures.
Audience: The intended audience includes stakeholders, management, and technical teams who require insights into the incident for future prevention.
Distinctive Feature: This report emphasizes actionable recommendations based on data-driven insights to enhance system reliability and incident response.
Outcome: The goal is to foster continuous improvement and proactive risk management to prevent similar incidents in the future.
Quick Specs
- Media: Text
- Use case: Incident reporting
- Techniques: Root cause analysis, impact assessment
- Models: Incident timeline, corrective actions
- Estimated time: 2-4 hours
- Skill level: Expert
Variables to Fill
- [DETAILED DESCRIPTION OF THE ISSUE OR OUTAGE] – Detailed Description Of The Issue Or Outage
- [OUTAGE START TIME] – Outage Start Time
- [OUTAGE END TIME] – Outage End Time
- [LIST OF SERVICES AFFECTED BY THE INCIDENT] – List Of Services Affected By The Incident
- [issue] – Issue
- [outage_start_time] – Outage Start Time
- [outage_end_time] – Outage End Time
- [services_impacted] – Services Impacted
- [time] – Time
- [event] – Event
- [systems_affected] – Systems Affected
- [estimated_downtime] – Estimated Downtime
- [transactions_lost] – Transactions Lost
- [revenue_impact] – Revenue Impact
- [user_impact] – User Impact
- [proximate_cause] – Proximate Cause
- [underlying_factors] – Underlying Factors
- [root_cause_explanation] – Root Cause Explanation
- [immediate_fixes] – Immediate Fixes
- [process_improvements] – Process Improvements
- [monitoring_enhancements] – Monitoring Enhancements
- [long_term_remediations] – Long Term Remediations
- [lessons_learned] – Lessons Learned
- [action_items] – Action Items
Example Variables Block
- [DETAILED DESCRIPTION OF THE ISSUE OR OUTAGE]: Service outage affecting multiple users
- [OUTAGE START TIME]: 2023-10-01T10:00:00Z
- [OUTAGE END TIME]: 2023-10-01T12:00:00Z
- [LIST OF SERVICES AFFECTED BY THE INCIDENT]: Web application, API services
- [issue]: Service outage impacting user access
- [outage_start_time]: 2023-10-01T10:00:00Z
- [outage_end_time]: 2023-10-01T12:00:00Z
- [services_impacted]: Web application, API services
- [time]: 2023-10-01T10:00:00Z
- [event]: Outage detected by monitoring system
- [systems_affected]: Web servers, database servers
- [estimated_downtime]: 2 hours
- [transactions_lost]: 500 transactions
- [revenue_impact]: $2000
- [user_impact]: 1000 users affected
- [proximate_cause]: Database connection failure
- [underlying_factors]: Inadequate server resources
- [root_cause_explanation]: Database overloaded due to high traffic
- [immediate_fixes]: Restarted database server
- [process_improvements]: Increased server capacity
- [monitoring_enhancements]: Added alerts for database load
- [long_term_remediations]: Implemented load balancing
- [lessons_learned]: Need for better resource management
- [action_items]: Review server capacity by 2023-10-15
The Prompt
You are an expert incident report writer tasked with creating a comprehensive report detailing a specific issue or outage. The report must cover the incident timeline, impact assessment, root cause analysis, and corrective actions taken. The goal is to provide a thorough, insightful, and actionable account of the incident.
ROLE:As an expert incident report writer, you possess deep knowledge of root cause analysis, incident management, and technical writing. Your role is to analyze the provided information about the incident and craft a clear, detailed report that will help stakeholders understand:
- What happened
- Why it happened
- What steps are being taken to prevent similar issues in the future
The incident report should be organized into the following sections:
- Incident Summary
- Provide a brief overview of the issue, including:
- Outage start and end times
- Services impacted
- Incident Timeline
- Detail the sequence of events leading up to, during, and after the incident.
- Include specific times and relevant actions taken at each stage.
- Impact Assessment
- Analyze the effects of the incident on various systems, including:
- Estimated downtime
- Transactions lost
- Revenue impact
- User impact
- Use quantitative measures where possible to convey the scale of the impact.
- Root Cause Analysis
- Identify the proximate cause of the incident and any underlying factors that contributed to it.
- Provide a clear explanation of the root cause, connecting the dots between contributing factors and the ultimate issue.
- Corrective Actions
- Outline the immediate fixes implemented to resolve the incident, as well as:
- Any process improvements
- Monitoring enhancements
- Long-term remediations planned to prevent recurrence
- Prioritize actions based on their potential impact and feasibility.
- Incident Follow-Up
- Summarize key lessons learned from the incident.
- List specific action items to be completed, along with their owners and due dates.
- Emphasize the importance of continuous improvement and proactive risk management.
- The report should be comprehensive, covering all relevant aspects of the incident from detection through resolution and follow-up.
- Use clear, concise language and avoid technical jargon where possible to ensure accessibility for a wide audience.
- Focus on objective facts and data-driven insights, rather than speculation or opinion.
- Prioritize actionable recommendations that will have a meaningful impact on system reliability and incident response capabilities.
- Maintain a neutral, professional tone throughout the report, focusing on learning and improvement rather than assigning blame.
- Issue Description: [DETAILED DESCRIPTION OF THE ISSUE OR OUTAGE]
- Outage Start Time: [OUTAGE START TIME]
- Outage End Time: [OUTAGE END TIME]
- Services Impacted: [LIST OF SERVICES AFFECTED BY THE INCIDENT]
The incident report should be formatted as follows:
Incident Summary- Issue: [issue]
- Outage Start Time: [outage_start_time]
- Outage End Time: [outage_end_time]
- Services Impacted: [services_impacted]
- [time]: [event]
- [time]: [event]
- [time]: [event]
- Systems Affected: [systems_affected]
- Estimated Downtime: [estimated_downtime]
- Transactions Lost: [transactions_lost]
- Revenue Impact: [revenue_impact]
- User Impact: [user_impact]
- Proximate Cause: [proximate_cause]
- Underlying Factors: [underlying_factors]
- Explanation: [root_cause_explanation]
- Immediate Fixes: [immediate_fixes]
- Process Improvements: [process_improvements]
- Monitoring Enhancements: [monitoring_enhancements]
- Long-term Remediations: [long_term_remediations]
- Lessons Learned: [lessons_learned]
- Action Items: [action_items]
Screenshot Examples
[Insert relevant screenshots after testing]
How to Use This Prompt
- [OUTAGE_START_TIME]: Time when the outage began.
- [OUTAGE_END_TIME]: Time when the outage was resolved.
- [SERVICES_IMPACTED]: List of affected services.
- [ESTIMATED_DOWNTIME]: Duration of the service interruption.
- [TRANSACTIONS_LOST]: Number of transactions affected.
- [REVENUE_IMPACT]: Financial loss due to the outage.
- [PROXIMATE_CAUSE]: Immediate cause of the incident.
- [LESSONS_LEARNED]: Key insights gained from the incident.
Tips for Best Results
- Clear Overview: Start with a concise summary of the incident, including key details like start and end times, and services affected.
- Detailed Timeline: Document the sequence of events meticulously, noting specific times and actions taken to provide clarity on the incident’s progression.
- Thorough Impact Analysis: Assess the incident’s effects quantitatively, covering downtime, lost transactions, and user impact to illustrate the scale of disruption.
- Actionable Follow-Up: Summarize lessons learned and outline specific action items with assigned owners and deadlines to ensure accountability and improvement.
FAQ
- What is the incident summary?
The incident involved a service outage from 10:00 AM to 2:00 PM, affecting multiple services. - What does the incident timeline include?
The timeline details events from detection at 10:00 AM to resolution at 2:00 PM. - How was the impact assessed?
Impact assessment included downtime of 4 hours, loss of 500 transactions, and significant revenue loss. - What corrective actions were taken?
Immediate fixes were implemented, alongside process improvements and enhanced monitoring for future prevention.
Compliance and Best Practices
- Best Practice: Review AI output for accuracy and relevance before use.
- Privacy: Avoid sharing personal, financial, or confidential data in prompts.
- Platform Policy: Your use of AI tools must comply with their terms and your local laws.
Revision History
- Version 1.0 (December 2025): Initial release.
