How do you set up an effective IT incident management process?
From a blocked printer output to an application out of order, there are many incidents, more or less critical, that your IT system experiences. Hence the importance of implementing an incident management process.
But how can you ensure that your incident management procedure is effective? What resolution stages should you define, and how should you determine the roles of each person in your process? Is it possible to provide a satisfactory solution for the user, in line with your SLA (Service Level Agreement), and within reasonable timescales?
To help you achieve greater efficiency and consistency, Appvizer explains the principles and stages of the ITIL framework in this article, and reminds you of the benefits to be gained from this method of working.
What is IT incident management?
Most IT incidents are managed in accordance with the Information Technology Infrastructure Library( ITIL) standard.
But what exactly is ITIL?
A project developed in the 1980s by the British Office of Government Commerce, ITIL is a set of documents listing the best practices to be applied in the management of IT services on a broad basis. The aim is to provide methodological support for professionals, with the intention of continuous improvement.
The ITIL process covers a number of themes (organisation of the information system, configuration management, change management, etc.), including incident management, which is specified as follows:
An incident is defined as any event which is not part of the standard operation of a service and which causes, or may cause, an interruption or a reduction in the quality of this service.
💡 This definition covers different types of incident:
- software or application incidents. Examples:
- programme error slowing down the user,
- application slowdown, etc.
- hardware incidents. Examples include
- printer output blocked
- hard disk nearly full, etc.
- Service requests. Examples include
- forgotten password,
- request for specific documentation, etc.
Incident management VS problem management
Incident management is often confused with problem management. However, they involve different procedures.
According to ITIL, problem management is used to :
Minimise the negative impact on the company's activities of incidents and problems caused by errors in the IT infrastructure, and prevent the recurrence of incidents induced by these errors.
➡️ In other words, problem management is more proactive, while incident management is more reactive.
However, the two processes work in parallel, with problem management operating through the identification of recurring incidents.
Why is incident management important?
A standardised process for managing your incidents generates numerous benefits for your company 🤩 :
- it reduces the impact, sometimes critical, of incidents on the company and the business more quickly;
- it greatly simplifies the procedure by avoiding, for example, back and forth emails ;
- It allows recurring incidents to be identified, enabling the problem management process mentioned above to be deployed;
- It improves the quality of the business knowledge base by setting up databases for handling incidents;
- It provides transparency within the organisation regarding the resolution of incidents;
- It increases user satisfaction and the productivity of everyone in the company.
☝️ Bear in mind that an incident management process goes beyond simply resolving an IT problem. It provides solid support for the company's business functions, reducing the number of slowdowns or stoppages in activities that would impact on turnover.
Example of a 5-step incident management procedure
#1 Identifying and recording the incident
To begin with, the incident must be identified, specifying :
- its name and number
- the identity of the person responsible
- the date on which the incident occurred
- and above all its characteristics (nature, seriousness and impact on operations).
E.g.: a server breakdown affecting several departments will be considered a major incident, whereas a connection problem at a single workstation will be considered less critical.
It is up to the department responsible to record these details on the device of their choice (software, spreadsheet, form, etc.) and to report it to the support teams responsible for dealing with it in accordance with the procedure.
#2 Incident classification and analysis
The incident is then classified according to the order of priority defined upstream and specific to your organisation, depending for example on the impact on the business and the urgency of the situation.
For example, a network failure could be classified as a "connectivity" incident, with a "high" severity level if it paralyses the entire company.
At the same time, an initial analysis is carried out to determine the possible causes of the incident. Diagnostic tools or even previous experience can be used for this assessment.
☝️ Note that if this is a service request, you must follow the associated procedure.
#3 Investigating and diagnosing the incident
All the information relating to the incident is analysed, with the aim of resolving it and getting it back into service as quickly as possible. The teams in charge of this work use various methodologies, from log analysis to real-time tests.
👉 E.g.: if a server breaks down, the team will consult the event logs for critical errors or use monitoring tools to check hardware performance.
Be aware that sometimes the first level of service is unable to resolve the incident: this triggers an escalation of incidents, i.e. their resolution is transferred to the next level.
#4 Incident resolution and return to service
Incident resolution takes various forms:
- the incident is repaired immediately. It has been resolved and operations are resumed as normal;
- a workaround has been found. Incident management must lead to the rapid restoration of services. If the system is not perfect, but it makes the situation "acceptable", the process is respected.
☝️ Note that if the underlying causes of an incident are unknown, but they seem to share the same origin, it is recommended that a problem management process be initiated. Remember that incident and problem management flows are often crossed.
#5 Closing the incident
To close an incident properly, the teams in charge of the process carry out a number of actions:
- They take care to record all the details of the incident and the time spent on it. ☝️ This documentation is used to create a history that can be consulted to improve protocols in the future;
- inform the user of the resolution;
- They ensure that all the details of the solution are clear and legible.
This level of detail reduces the risk of conflict between the different stakeholders.
Stakeholders in incident management
Different stakeholders are involved in incident management. Although they differ from one organisation to another, a few basic roles can be identified:
- The requester/user: reports the incident, clearly specifying what it is. The technical team may also call on them at the end of the process to respond to enquiries.
- The different levels of support: depending on their level, the support teams provide the solutions needed to resolve the incident, and sometimes reassign the unresolved incident to the next level up.
- The incident manager: ensures that incident management is carried out correctly, plans the procedure and may recommend areas for improvement.
- The process owner: within the company, this person assumes responsibility for the incident management process in general. They may also be responsible for defining the KPIs (Key Performance Indicators).
10 best practices for managing your incidents
To be better prepared to manage IT incidents and minimise their impact on your organisation's operations, we recommend that you follow these 10 best practices:
- ✅ Train staff. Make sure the support team is well trained on procedures and tools. The aim is to ensure a diagnosis that is both rapid and accurate.
- ✅ Prioritise effectively. Establish clear criteria to intelligently prioritise incidents according to their severity or impact on the business.
- ✅ Establish rigorous documentation. Document each stage of resolution, from diagnosis to corrective action, for effective follow-up and future learning.
- ✅ Communicate transparently. Communicate clearly and regularly with stakeholders to keep them informed of the status of the incident and the actions taken.
- ✅ Implement a validation process. Before closing any incident, validate the resolution with users. This confirms that their problems have been fully resolved.
- ✅ Carry out a post-incident review. Carry out a post-incident review. It will serve to identify root causes as well as potential areas for improvement.
- ✅ Update the knowledge base. Regularly update the knowledge base with incident resolution information, again to help resolve similar incidents in the future.
- ✅ Automate repetitive tasks. Use automation to manage routine tasks, such as triaging incidents. The time saved will allow the team to focus on more complex problems.
- ✅ Think "continuous improvement". Carry out regular audits of your incident management procedure, with the aim of identifying opportunities for improvement.
- ✅ Use an incident management tool. This is undoubtedly the most important tip! Indeed, by investing in a robust incident management system (ITSM in particular), you track and document all incidents centrally.
The right tools for incident management
You've got a clearer picture of the problem of incident management, but perhaps you're wondering how to put all these recommendations into practice? Do you already imagine applying your incident management procedure using an Excel spreadsheet or a traditional project management tool?
Fortunately, specific software has been developed to support your teams at every stage of the incident management procedure.
To help you, have a look at our selection ✔️ :
- Jira. Developed by Atlassian, the Jira ticketing tool standardises the processing of tickets opened following the reporting of an incident.
😀 Why Jira?- create tickets with a precise level of information (descriptions, severity level, etc.) and follow all the processes required to manage them ;
- easily classify and prioritise bugs, and assign them to the right employee or department;
- integrate your tickets into a ready-made workflow, or one that can be customised to suit your needs and processes.
- NinjaOne. NinjaOne is a complete IT asset management solution for SMEs, ETIs and large enterprises.
😀 Why NinjaOne?- Centrally and proactively supervise your entire IT infrastructure to detect incidents as early as possible ;
- automatically apply the necessary patches, reliably, to all your endpoints ;
- store all the standardised, structured documentation relating to your processes on the platform.
- Octopus. Octopus is an ITSM (Information Technology Service Management), i.e. IT service management software.
😀 Why Octopus?- benefit from a tool developed in accordance with ITIL best practices: your teams can apply them naturally without needing to master them perfectly beforehand ;
- easily manage requests from your users, whether for incidents or service requests;
- improve preventive action thanks to a database that manages all aspects of the configuration of your information systems.
- Splunk Enterprise Security. Splunk Enterprise Security is a SIEM (Security information and event management) designed to support you in strengthening the security of IT systems, and in incident management.
😀 Why Splunk Enterprise Security?- benefit from a solution focused on analytics and therefore streamlining cybersecurity-related tasks ;
- get real-time insight through customised dashboards and views; ;
- detect incidents faster and take preventive action.
What are the key points of IT incident management?
Incident management, as standardised by ITIL, is a procedure that should be incorporated into your information system as quickly as possible, as it promises to provide a clear and rapid response in the event of an incident.
What's more, it gradually leads to a reduction in the number of incidents by feeding into your problem management processes, and hence your preventive actions.
And the good news is that everyone wins when such a working method is put into practice:
- Technical teams work more efficiently and transparently;
- users are less affected by bugs and more satisfied with your product;
- the company suffers fewer losses in the event of a critical incident.
Finally, it's worth remembering that good incident management goes hand in hand with the use of relevant tools, which support your process and save your teams precious time.