Cloud outage planning: As NY Times tech columnist Steve Lohr points out today, (Amazon’s Trouble Raises Cloud Computing Doubts), this weekend’s storms brought down several Amazon-based services, including Netflix, Pinterest and Instagram—and it should provide a learning experience.
As I have said for some time, the cloud is not a distributed computing infrastructure. The cloud remains place-based, it just isn’t your place. All organizations using cloud services, from any providers, owe it to their customers and if they are public, to their shareholders, to perform due diligence on the implementation of the services they lease. That does not mean reading a Service Level Agreement and going, “hey, Amazon has this covered.” Outsourcing IT is not about outsourcing business responsibility. Managers who outsource to cloud service providers, regardless of the size or reputation of the service provider, must look at the actual deployment. They must look at the hardware, the people, the location of services, the fail-over plan, the disaster recover plan, and as this most recent case points out, the location of the servers themselves.
Cloud Outage Planning: responsibility and accountability
The cloud has abstracted responsibility and organizations and the managers that work for them must be held accountable for their lack of foresight. Responsible planners must ask if their service providers have contingencies that go beyond paying for downtime. The cloud providers service level agreement becomes the outsourcers service level agreement. If you offer a 24-7 service, then you need to make sure your providers don’t just say they can provide it—you need to understand how they plan to provide in all but the most extraordinary of circumstances.
The providers themselves must start thinking like scenario planners and actively creating more distributed, rapid response infrastructure.
And yes, cloud services will get more expensive. When you engineer a service for robustness, you pay for the peace of mind. That is why an Otterbox case for your iPhone costs more than one from other suppliers. Otterbox designed their case for your worst case. I don’t think cloud providers have done this. The lake of history and the arrogance of technology reliability leads managers to cut corners that only reveal themselves when those corners are traversed, as they were this weekend.
As much as this outrage should drive cloud service providers to better engineering, it should really drive large organizations that rely on the cloud to better procurement practices. Many start-ups don’t include mature managers in critical roles. They may hire a lawyer to write a contract, but do they have decades of disaster planning on their teams? Most likely not. This also means angels, VCs, and other investors should, as I say all too often, increase the requirement for management acumen among their investment portfolios. Investor guideance is also a part of Responsibility. Everybody wants a low-cost payout, but when the underlying foundation proves unstable, the cost of re-engineering is higher than the cost of doing it right the first time.
Here are three other posts in this blog that offer caution for the cloud:
(perhaps you will read them now!)