So, you have read all of the IT trade press discussions about moving to the cloud, and are now considering taking that all important next step of investigating the pros and cons of an Enterprise IT infrastructure migration to the cloud. After all, who wouldn’t want to shave 20-30% off of an existing IT budget AND increase IT responsiveness to the business at the same time? Any CIO/CFO is going to start with the business case of moving to the cloud that makes sense for their business.
Enterprise IT organizations approach public cloud adoption from different points in IT maturity. IT organizations that have not done ANY virtualization, consolidation, or application rationalization have much further to go to prepare to jump to public cloud than those who have done these activities. Therefore, the business case for any individual IT organization will be largely depending on their IT maturity.
So how do we begin answering the question: “Does is make business sense to move my server infrastructure/applications to the cloud?” I find it helps to start with the following framework:
- Budgeting - How do I budget for projects, and how will that differ with a public cloud approach?
- Comparing on premise CAPEX vs cloud OPEX costs – Cloud is based on an OPEX expense model, while most Enterprise IT shops today function on a CAPEX model. How do I compare apples to apples?
- Switching costs – How do I determine the amount of work/expense I need to spend to get to a cloud model?
Almost all IT shops I have worked with budget based on projects they choose to take on every year. If the business (FLOSHIM – Finance, Legal, Operations, Sales, HR, IT, and Marketing) requires a new IT system be deployed, a project plan and budget is determined for that project. Hardware, software, and people/services costs are assessed, the project is kicked off, and ultimately completed per the plan. A cloud migration creates a real problem for most IT shops as the original IT project never contemplated a large scale migration to the cloud, and the costs associated with that move. Therefore, a new non-business aligned project must be spun up and and business justified.
A new cloud migration project looks similar to what an outsourcing contract might look like. But what makes it different is the variable cost components of cloud. Most IT outsourcing contracts have a fixed spend agreement for managing existing infrastructure components, with a variable costs for adding new capacity based on business growth. Although cloud is similar to this, there are some critically important differences.
Business impact of elasticity
Cloud services are priced per hour (in most cases). The short time window means cloud services are “elastic” (services spun up/spun down as needed). There are enormous benefits to elasticity from a business perspective:
- Time to solution – a cloud infrastructure can be spun up in hours vs an on premise solution being spun up in weeks or months.
- No stranded equipment – if a project is canceled unexpectedly, or if another the solution is no longer needed, the service is simply turned off and the billing stops.
- New business problems can be solved – It is not uncommon for businesses now to come into the cloud, spin up thousands of VM’s in the morning, and turn them off in the afternoon. This creates new business opportunities that would never be economically feasible if done on premise.
These 3 reasons alone are often enough to spur an IT organization to consider a broad scale migration to the public cloud.
Challenges to budgeting with cloud
As great as elasticity is, it does have a downside. From a budgeting perspective, cloud is hard. Cloud is almost infinitely variable. Servers can be turned in in the morning, shut off at night, increased during a major product release, or turned off when a business event forces an IT change. Now imagine trying to predict what spend might look like in any given year. As an organization, predicting cloud spend can feel like an exercise in futility and risk.
In addition, companies like Microsoft offer prepaid service spending plans (Azure Monetary Commit) that allow you to buy a large chunk of service credits that are lost if not consumed before a specific time period (usually 1 year). If a project comes along that consumes more of the service credits than were purchased, how is overspend allocated back to the business? How does the IT finance organization ensure that only the right amount of credit is purchased, but ensure it is consumed at the end of the year so no credits are wasted?
Clearly, any cloud migration is going to require some significant changes to how budgeting and financial management of the cloud service is done. The key to this is doing it over time to develop this muscle, and being careful not to overcommit to cloud until these organizations are comfortable monitoring these services at scale. I have heard many IT organizations make a cloud commitment to an application, only to receive a large monthly bill they didn’t expect, and then decide to bring the solution on-premise after the fact because of the fear of runaway unpredictable costs. Make no mistake, the success or failure of a cloud migration will significantly rest on success of the IT finance organization.
CAPEX vs OPEX
As hard as cloud budgeting is for Enterprise IT, the issue of CAPEX vs OPEX is even harder. However, before getting into the complexities of managing this for a typical IT organization, let’s first have a philosophical discussion as to why OPEX ultimately is so much better for an IT organization than CAPEX.
The tyranny of CAPEX
Per the budgeting discussion above, CAPEX is probably one of the most wasteful aspects of how Enterprise IT is run today. I find most IT organizations put MASSIVE effort into defining the best IT solution into place from the beginning…performing extensive costing/ROI analysis. But what most IT shops are almost universally bad at is tracking the financial usage of those assets after they are deployed. Below are a few real life horror stories I have heard over the years related to this topic:
- Purchased equipment, but not used – One IT organization I worked with once had a business project that was approved, and as a result, IT went ahead and purchased over 100 servers with over a $1Million CAPEX spend. Unfortunately, just prior to the project going live, the business pulled the plug. It was too late to send the servers back, and all of the servers had to be “repurposed”. This meant accelerating existing projects, hunting for older servers to replace (that still had useful life left), and even dreaming up new projects that would have never been financially viable in the first place. Nobody ever knew exactly how those servers were repurposed, or what the financial benefit to the organization was for utilizing those servers. In fact, years later after a software audit led to an overall asset management audit, dozens of brand new servers were found in closets still in the original boxes…representing hundreds of $thousands in waste.
- Decomissioned servers that “came back to life” – Another IT organization I worked with had a situation where a software audit led to over 50 servers being “discovered” that the IT organization had thought were decommissioned. The IT staff left the decommissioned servers in the rack in case they could be used again in the future. A power spike in the datacenter caused these “decommissioned servers” to turn back on again. They were then subsequently caught in a software audit scan. Even putting the cost in unlicensed server costs aside, these servers had been running over a year…each one chewing up power. Given a server can consume anywhere from $600-800/year of electricity, the estimated cost to the organization of power costs alone were around $30,000. Of course, this ignores the fact that these decommissioned servers had been forgotten, and not sold, repurposed, or in any way utilized.
- 1500 “Lost” servers – Another IT organization I worked with underwent a software audit where the 3rd party brought into scan the organization found 1,500 servers that were completely unaccounted for in their asset management systems. Unfortunately, for some inexplicable reason, WMI (Windows Management Instrumentation – a Windows Server management API/utility) was turned off on these servers. As a result, these servers were unmonitored and unmanaged…I suspected a potential security breach. In addition, there was no understanding of whether these servers were being used to their best financial usage.
Of course, these are all horror stories, and I am not saying every IT organization is this poorly managed. But this does illustrate an important concept. Jack Welch once said “you can’t manage what you can’t measure”. Each of these scenarios represented cases where sizable CAPEX financial assets were not managed/measured after they were deployed, and as a result, wasteful mis-use of assets happened.
So how does OPEX/Cloud fix this problem? Very simply…cloud is by it’s very nature is managed AND measured. Every month, the organization gets a bill for what it consumes. When IT AND the business have visibility into what is being spent, action can be taken by the organization to correct misallocation of resources. In other words, IT and the business is held accountable for the usage of IT assets with a switch to cloud. Here is another story of this principal being applied.
An IT organization I worked with decided to embark on a big data initiative. This group decided to go with a Hadoop solution they put up on AWS IaaS. Because of a miscalculation of network ingress/egress fees, the expected bill of $30,000 in the first month was instead $300,000. Of course, the IT organization and business didn’t sign up for this level of spending, and immediately decided to suspend the project, and move the solution in house by procuring their own servers.
You may be asking…”how is this a good story for the cloud?” The fact that they received a larger bill than expected meant the organization immediately took steps to rectify the situation because the OPEX spend was VISIBLE. What is tragic in this story is that by moving the solution on premise, it immediately made financial visibility of the project unmeasured and unknowable. Maybe it was, and maybe it wasn’t. Nobody will ever know if the solution deployed made financial sense AFTER it was deployed.
What is tragic here is that the tyranny of CAPEX is a silent tyranny. It is a story that goes untold, and is often hidden in the bottom line of a company. Let’s assume for a minute that public cloud is more expensive than an on-premise private cloud (which it isn’t…it is indeed less). Even if the cost is more for cloud, I would still insist on pushing for public cloud simply because it makes the finances of the solution VISIBLE to all parties involved.
Budgeting switch to OPEX
Today, almost all IT organizations budget according to CAPEX. Very little of the IT budget is OPEX. Some industries like Utilities and Telecommunications LOVE CAPEX. They are very comfortable with the concept of CAPEX, and are masters at managing depreciation cycles. In the case of public utilities, CAPEX investments are simply passed along to rate payers. However, for most organizations, OPEX is preferred…making cloud a slam dunk if this is the case. However, transitioning to a cloud model does require a fundamental rethinking of budgeting for IT, and sponsorship from senior leaders in the finance organization.
CAPEX and OPEX comparisons
So now that the business case has been made for OPEX, how do we determine which is cheaper…cloud or on premise? Traditional IT is CAPEX, and cloud is OPEX. In most cases, determining the OPEX cost of the target environment in a migration is pretty easy to calculate. If I have 50 servers and/or VM’s to migrate, I know I need 50 VM’s of a certain capability spun up in the cloud. I simply multiply make adjustments for the hourly rate multiplied out to a year, and I now have my annual OPEX budget for the cloud environment. Prices are public, and every cloud provider has easy to use calculators to come up with these costs.
However, it is NOT so simple to calculate existing internal IT costs. In fact, I have found most IT organizations cannot tell you what it costs them to operate a server in any given year. This is because of the following challenges:
- Amortization schedules – Most IT equipment is typically depreciated over a 3-5 year cycle. However, most IT organizations will still continue to operate fully depreciated equipment. In some cases, I have seen servers that are well over 10 years old still in production. How do you calculate a cost for 10 year old hardware that was fully depreciated after 3 years? Cloud OPEX continues to bill year after year without ever getting a “free period” after the assets are fully depreciated.
- Licensing – This is an area where most IT organizations are not as efficient as they could be. Most server software licensing is done by core (a function of CPU processing). This is where the lack of visibility into CAPEX asset utilization really fails. Newer hardware has significant faster and more efficient cores. In the case of databases, performance is actually much more dependent on IOPS than CPU’s (SSD’s are your friend). Therefore, licensing costs (often the most expensive part of a solution) can be dramatically reduced by simply migrating older hardware to newer hardware. Most Cloud providers include licensing as part of their service offering. How do I use my old licenses in a cloud model? Will the software vendor give me credit for my past sunk license investment? It take a bit of work to figure out these assumptions.
- Storage – The most staggering change of comparing costs of cloud to on premise is in the area of storage. On premise storage is frighteningly expensive. Cloud storage more and more is being given away for free as part of a larger service. But as many storage experts will tell you, 1 GB of storage in one situation can be a dramatically different cost in another. For example, a 1 TB hard drive at Costco is around $200. 1TB of Enterprise fully redundant storage drive can be in the $1000’s. Cloud storage can be severely limited by IOPS, or egress fees can also change the dynamic as network in an IT shop is usually treated as free. Again, careful assumptions must be made to try and get to apples and apples comparisons.
Based on the models I have come up with, typically IaaS servers are the same or slightly less than on-premise for typical application VM’s. Storage in the cloud is almost always significantly cheaper than on-premise. However, if licensing is handled correctly, and if the right database architecture is followed, hosting databases is typically MUCH more expensive to host in the cloud than on premise.
Now we get to the fun part of this exercise. You have now determined that the cost of public cloud is lower than on premise, understand the value of elasticity, and the value of OPEX vs CAPEX. So what is this switch going to cost my organization? There are many variables that will affect this:
- Timeline to switch – Moving Enterprise IT assets to the cloud is not a trivial undertaking. These projects will be measured in years…not months. Therefore, determining a correct cadence for your organization is critical. Obviously the faster the migration, the more cost up front.
- Your IT maturity – If you are already have a solid Enterprise Architecture team, if you have already completed a great deal of virtualization, and have a mature relationship with the business…moving to the cloud will be much easier.
- Business imperative to migrate – if you have a business problem you are trying to solve that the migration project can “piggyback” on top of, this will make things much easier. Often this can be an M&A event, a large strategic project, or a mandate to modernize.
Of course, the faster the migration, the more likely it will be that you will need to bring in 3rd party assistance. Most IT organizations have designed their staffing requirements around “keeping the lights on”, and are not staffed for large scale migrations. In addition, migrating to the cloud often represents a skillset that existing IT doesn’t have today. Therefore, bringing in a 3rd party who has done large scale outsourcing in the past will be the logical choice to help in this endeavor.
As for roles needed for various phases of the project, you will need:
- Inventory analysis – This analysis will go far beyond the kinds of information currently stored in asset management systems. Cloud migrations require not only an exhaustive analysis of servers (server configurations, performance metrics, etc), but also applications deployed and their dependencies. There are many 3rd parties with specialized tools that are designed to tease out these important details with the goal of determining “low hanging fruit” for migration.
- Business and technical Analyst – Once the Inventory analysis is complete, there is now a need for a business and technical analyst to do the following:
- Define the business case for migration and present to management
- Define the target environment
- Map it to the existing environment to the target environment
- Prioritize the infrastructure that should move first.
- Migration technical resources – these are the individuals who will actually move the infrastructure over. These include:
- Network engineer – Before anything can be moved, the networking must be designed and configured per the technical plan.
- Virtualization engineer – This role handles any P to V (Physical to Virtual) work needed to be done to move the VM to the cloud. Often this is a very specialized skill that may need to be externally sourced.
- Project management – I can’t state how important this role is. Migrating production systems require extensive collaboration and scheduling with the business.
- DBA – having an excellent database administrator involved full time in the migration is critical for success. Whether the database resides in a Virtual Private Cloud hosted on premise (but connected to the public cloud over a WAN infrastructure), or whether the database is actually moved to the cloud will be a very sensitive move. This must be a senior level DBA resource that many companies do not currently employ today.
- Cloud Operations Team – these individuals should be staffed for ongoing deployment support. As we have discussed earlier, the skills required to manage a public cloud environment is very different than managing an on premise infrastructure. The cloud operations team must be very well versed in virtualization, and understand the limitations of the cloud services you are consuming. These can be the same individuals who currently support the on-premise environment, but they will require training and perhaps initial oversight by a 3rd party until the transition is complete.
An Enterprise IT operation can be incredibly complex, and of course is mission critical to any business. Because of these complexities and the risks involved, many IT organizations are rightfully concerned about considering a move to the cloud. However, seldom has there been an opportunity to shave potentially 10’s of $millions off of an IT budget, and create an ability to transform an entire organization like a migration to the cloud. It is worth the risk. However, how can these risks be mitigated? Quite simply…time.
As with any large scale effort, a cloud migration can and should take years with careful planning. If there is anything agile development has taught us in the software development world…it is the importance of shorter timelines and well defined smaller deliverables. A cloud migration should be handled in a similar way. Cloud migration projects should be very small initially to prove out the business case for migration, AND allow for IT organizational muscle to be built up and tested over time.
CAPEX and timelines
As discussed prior, CAPEX can make a business case a real challenge in comparison to OPEX. However, CAPEX can play a huge role in defining a timeline for migration. Most server assets are depreciated over 3-5 years. In almost every single case, a move to cloud cannot be cost justified on assets just starting out in their depreciation cycle. A brand new server that has just been purchased needs to be “used up” first. The best candidates for migration are almost always servers at the end of their useful life. In addition, in almost every case a new project should go to the cloud by default as the cloud will be less expensive over the term of the project.
If an Enterprise IT organization takes this approach, a cloud migration can be accomplished entirely within a 3-5 year period if new projects are put in the cloud, and old depreciated servers are migrated to the cloud. This will cause the least organizational disruption, and allow for IT to develop “organizational muscle” around cloud.
In summary, there is a HUGE value of any Enterprise IT organization to move their infrastructure to the public cloud because of cost savings, elasticity, and OPEX vs CAPEX benefits. However, proving this can be difficult and time consuming because of a lack of detailed inventory systems, CAPEX vs OPEX comparisons, and the challenge of budgeting for something as inherently unpredictable as elastic cloud consumption. But, it can and should be undertaken given the enormous benefits to any Enterprise IT organization. Additionally, a cloud project should span multiple years, and focus first on new projects and retiring old servers instead of a “migrate the whole enchilada” approach.