ITIL Essentials

What is ITIL?  

I just took some training and passed an exam on ITIL, a framework for IT Services Management. But what can you really learn from a set of books on IT Infrastructure published by a UK government agency? Oh, quite a lot!

The idea behind it is to offer best practices for IT processes, using consistent terms that can be understood globally, no matter what IT environment you work on. If you find any value in similar systems like the ISO 9000, CMMi or EFQM, you will probably benefit from learning about ITIL. For me, the most important aspect of this training was to get a better understand of what this ITIL buzz is all about and to understand the terminology used by it. It helps put the entire set of IT Processes in perspective.

So what are these processes anyway? Well, the ITIL books define a number of them and, since they all interact with each other, it’s actually hard to talk about each one individually. They are grouped into two major groups each one in its own book: Service Support and Service Delivery. The Services Support book describes services (and a function) that is offered to your end-users (called simply IT users). The Service Delivery book includes services that are provided to the teams that sponsor or hire those IT Services (called simply IT Customers). There are other books in the library, but these are the main ones.

1. Service Support (User-Facing)

1a. Service Desk:  Central point of contact for users and the only way for a user to interact with your IT infrastructure. This is one is actually a function (as opposed to a service), since it actually is usually a group or a department. You need to make sure you have a good system in place to get the user the needed answer, which sometimes will lead to creating an incident.

1b. Incident Management: Restore service to users as quickly as possible. When an IT service is interrupted or not working as expected (you find out via the Service Desk or some monitoring tool), you need to work on getting things back to normal. This is where the escalation (both functional and hierarchical) process is defined.

1c. Problem Management: Understand problems and make sure they have minimum impact. Some incidents will turn out to be real problems and most of the effort here is to figure out what went wrong and making sure you get to fix or a workaround. This is different from Incident Management and is not the result of an escalation. The main thing is to turn problems into known errors (by determining the root cause) and then eliminating known errors (by raising a request for change). It includes the post-implementation review of problems.

1d. Change Management: Implement large quantities of changes with minimum disruption. Once a request for change in the production environment comes in, it needs to be looked into. Some changes are pre-approved (standard changes) and some need to follow an approval process (either normal or urgent). The process includes requests for change (RFC), forward schedule of changes (FSC), Change Advisory Boards (CAB), Emergency Committee (CAB/EC) and Post-Implementation Review (PIR) meetings.

1e. Configuration Management: Logical model of the IT infrastructure. The information is stored as Configuration Items (CI) in the Configuration Management Database (CMDB). Contrary to common belief, a CI is not only a piece of hardware and can be software, documentation, procedures, etc. They can also be related to each other, and could be a single component or an entire system.

1f. Release Management: Holistic view of changes. A release is a set (or bundle) of new or changed configuration items, which will be implemented as a set in the production environment. In this case, people usually think only software, but it could be hardware as well. Release management owns the Definitive Software Library (DSL) and the Definitive Hardware Store (DHS).

2. Service Delivery (Customer-Facing)

2a. Service Level Management: Maintain and gradually improve business-aligned service quality. Includes planning, negotiating and managing Service Level Requirements, Service Targets, Key Service Items and the Service Level Agreements (SLA). This is where all the details of a typical SLA are defined, as well as the OLAs (Operational Level Agreements, which are agreements between internal IT functions) and the UCs (underpinning contracts, which are agreements with suppliers). Quality enhancement uses Service Improvement Programs (SIPs).

2b. Availability Management: Optimize infrastructure capability to provide services as needed. This process describes plans  for keep keeping the services highly available (according to the SLAs) and to recover them if they ever fail. It defines terms like MTBF (Mean Time Between Failures), MTBSI (Mean Time Between System Incidents) and MTTR (Mean Time to Repair). It also defines the Availability Plan and related methods and techniques like CFIA, FTA, CRAMM, SOA, TOP and ITAMM.

2c. Security Management:  Ensure Confidentiality, Integrity and Availability and the data associated with a service. The goal is to ensures compliance with the IT security policy. Note that this is related to Availability Management, but the concern here is with the information, not the services.

2d. Financial Management: Cost effective management of the IT assets and services. This describes the two mandatory aspects (Budget and Accounting) and one optional (Charging). There’s a lot here about Financial Management like depreciation, cost models, and charging policies.

2e. Capacity Management: Provide for adequate capacity to meet the needs of the business. Describes Business Capacity Management (BCM), Service Capacity Management (SCM), Resource Capacity Management (RCM) and the Capacity Database (CDB). This last one is the basis for a number or reports on current and future capacity requirements. This also includes application sizing, demand management, resource management and performance management.

2f. IT Service Continuity Management: Ensure the IT services can be recovered. This has a lot to do with risk analysis and recovery options, including could, warm and hot standby. The Business Continuity Plan and Business Impact Analysis (BIA) are defined here.

Conclusion 

As you can see, much of it is about defining and understanding specific terms like Service Request, Incident, Problem, Known Error, Workaround, Change, Standard Change, Request for Change, Root cause, Escalation, Impact, Urgency, Priority, Release, Release Unit, Rollout, Definitive Software Library, Definitive Hardware Store, Service Level, Service Level Agreement, Service Improvement Program, Service Quality Plan, Budget, Accounting, Charging and Risk Management. That sometimes includes acronyms like CI, RFC, CAB, CAB/EEC, SFC, PSA, PIR, C&CM, CI, CMDB, DSL, DHS, SLA, OLA, UC, SIP, SQP, MTBF, MTBSI, MTTR, VBF and CDB.

The main benefit, in addition to providing the best practices in the industry, is helping you communicate efficiently with your fellow IT professionals around the world, with little ambiguity. 

I should warn you, though, that ITIL is a very generic framework and it's not tied any specific architecture or vendor. There is also no detail on specific IT functions like Networking, Storage and Directory Services. They keep it pretty generic on purpose, but this gap is filled by other frameworks, like the Microsoft Operations Framework (MOF), which extends ITIL and adds richer details in several areas.