In the first installments (Part 1, Part 2, Part 3, and Part 4) of our Cloud and NG9-1-1 series we gave a primer on Cloud technology, deployment options, service models, and security. This week we'll take a look at Quality of Service, Service Level Agreements and putting together your requirements.
Quality of Service and Service Level Agreements
Vendor SLAs have long been a part of the public safety lexicon, however; as critical services become more interdependent and sometimes outside the direct control of the emergency communications agency, these agreements become more important. Whether the public safety entity decides to manage its own cloud infrastructure or outsource that to third parties, QoS and SLAs are keys to ensuring success. Availability of emergency communications services depends on redundancy and elimination of single points of failure and effective failover/back-up processes should those redundant systems fail. Strong service level agreements ensure that the provider(s) of cloud services are aligned with the stringent requirements public safety requires.
At a high level, an SLA should include the following components:
- The list of services the provider will deliver and a complete definition of each service.
- Metrics to determine whether the provider is delivering the service as promised and an auditing mechanism to monitor the service.
- Responsibilities of the provider and the agency and remedies available to both if the terms of the SLA are not met.
- A description of how the SLA will change over time.
IBM came up with a very useful list of responsibilities to consider when developing an SLA (source: "Cloud Computing Use Cases Whitepaper" Version 4.0, developerWorks Editors, IBM). While oriented toward evaluation of a third party it is an extremely helpful checklist regardless of the service model. A subset of the items relevant to public safety use of the cloud is below:
- Privacy and Data encryption: Basic privacy concerns are addressed by requirements such as data encryption, retention, and deletion. An SLA should make it clear how the cloud provider isolates data and applications in a multi-tenant environment as well as access control policies.
- Data retention, deletion: What are the local jurisdiction requirements around data retention and purging?
- Regulatory compliance: If regulations must be enforced because of the type of data, the cloud provider must be able to prove compliance. For example, HIPAA may require different means of handling some elements of the data within the network.
- Transparency: What is the requirements on the provider to be proactive in notifying agencies when the terms of the SLA are breached. This includes infrastructure issues like outages and performance problems, as well as security incidents.
- Certification: The provider should be responsible for proving required certifications and keeping it current.
- Performance definitions: What does uptime mean? While geo-redundancy of the service is a must-have, what does that really mean? All the geographically redundant servers are available? Or just one?
- Monitoring and Auditing: How will the SLA be monitored and performance audited? You might want to specify a neutral third-party organization to monitor the performance of the provider and the SLA should be clear about any potential audits (which can be costly to the provider and agency). The SLA should also address trusted partners that are able to access the network. What controls are in place to ensure that only trusted partners meeting your minimum standards are plugged into the ESINet?
- Metrics: These are the tangible somethings that can be monitored as they happen and audited after the fact. The metrics of an SLA must be objectively and unambiguously defined. Some common metrics include:
- Throughput: System response speed. Are calls routed correctly and quickly?
- Reliability: System availability.
- Load balancing: When elasticity kicks in. How does the service leverage the core benefits of a cloud service?
- Durability: How likely to lose data.
- Elasticity: How much a resource can grow.
- Linearity: System performance as the load increases. At what point is call quality compromised and how is that managed?
- Agility: How quickly the provider responds to load changes.
- Automation: Percent of requests handled without human interaction. Is human interaction required for failover processes?
- Customer service response times. How quickly are issues addressed?
It is easy to request extremely high levels of service across the board on all functions, but that can be very costly. One option to consider in developing SLAs for NG9-1-1 related solutions is to tier the communications services. What functions are critical to the core mission of answering a call and dispatching? As you move out from those core functions, consider the SLA that you need (and pay for) for those functions.
As with any solution, it is imperative to carefully document your requirements for both functionality and performance. Through the i3 process, NENA and the participants in the process of defining NG9-1-1 have done an outstanding job of defining a broad set of capabilities that are expected of a NG solution. Because of the inherent interoperable nature of the solution, this baseline definition is key. However, each agency will have localized needs – there is no one size fits all solution. Within the broad framework that is defined as "NG9-1-1", there is sufficient flexibility to create a solution that works for a specific agency in terms of functionality, performance and cost, but that also seamlessly interoperates with other locales. Each entity should carefully evaluate their functional needs against the broad definition of what is possible.