How cloud cost optimisation saves money and increases efficiency
Aaron East is an eSynergy Solutions lead associate and cloud solutions architect, working with both large and small organisations on cloud adoption and migrations to AWS and Azure. He has hands-on experience working on cloud-native platforms like Kubernetes (EKS, GKE, AKS) along with managed services for data platforms. Aaron’s first cloud project was for O2, in 2011, working on AWS (before they introduced VPCs).
Defining, managing and optimising costs represents one of the bigger challenges when running services in the cloud. At eSynergy, we have helped many clients adopt and migrate to the cloud and – more often than not – cloud financial management is something of an after-thought; it’s an activity that normally precedes the realisation that costs have a steady growth upwards and, only when spend starts to exceed expectations, do organisations tend to react…
Cost-saving on the cloud
A quick Google search will reveal a number of resources stating the ‘top ten ways to cost save on the cloud’. What’s missing here is how we actually do this; what tools are available; how we approach the problem; what effort is involved; what the risks are and, of course, the return on investment (ROI).
We’re going to look at this from a retrospective angle, i.e. considering a company that already hosts in the cloud. In this fictional example, AWS costs have been steadily growing and they are now becoming a concern. We’ll look at medium to large system, made up of standard ec2 deployments of a three-tiered application (webserver, app server, db)…
Our fictional company
Aaron Credit History is a b2b business, offering credit history to businesses doing background checks. They deliver APIs, back-office solutions to customers and administration systems for their employees.
Total Cost of Ownership (TCO)
The company has a sizeable system of 100 x t3.xlarge & m4.large instances:
How do we reduce TCO?
Utilisation is on-demand, using a non-cloud-native solution, and a 2/3 tier system of application load balancer, application server and database (or there could be a proxy server or webserver for front-end and static content).
Let’s start with the obvious and simple changes we can make to reduce overall TCO (check out the Aaron Credit History TCO Calculator)…
We have a number of options to make savings from procuring reserved instances. With AWS, it depends on how much you want to pay upfront and how long you’re willing to sign a contract for (the more you pay upfront and the longer you are willing to sign a contract for, the more savings you’ll make).
There is a number of factors you should take into consideration when choosing your reserved instance agreement…
If Aaron Credit History was to decide to convert all the largest instances (t3.xlarge) to one-year reserved instances, their monthly savings would be approximately US$14,788.47 – so annually, that would be $177,462.64. But they have a one-off fee of $97,200.00, so the savings are actually $80,262.64 or $6,888.55 per month. Aaron Credit History was spending $922,311.36, so this equates to a saving of 8.70%.
This seems like a very quick win, but it raises a few questions: Are any of these workloads planned for decommission? Are these workloads correctly sized? Has a PO been raised for $97,200.00? It looks like a straightforward change but there are some implications that need to be looked at, prior to implementation.
Let’s look at some other options in the meantime and come back to this… Now let’s look at the same option but with a three-year reserved instance…
The monthly bill from AWS will stay the same but, if we divide the three-year upfront cost into three years, you clearly see the savings… If Aaron Credit History was to decide to convert all the largest instances (t3.xlarge) to three-year reserved instances, their monthly savings would be approximately $14,788.47 so, annually, that would be $177,462.64. They have a one-off fee of $192,198.00 which, when divided by three years, is $64,066.00 – so the savings are actually $113,396.64 or $9,449.72 per month. Since Aaron Credit History spends $922,311.36 annually, this equates to an 12.3% saving. This is not a tremendous difference from the 8.7% saving we get with the 1-year reserved – and we need to come up with $192,198.00 – so you can see it’s a big commitment.
The biggest cost
RDS, at $24,993.00 per month or 299,916.00 annually, is 33% of TCO. At present, they are utilising db.r4.xlarge, which could be correct but is questionable, based on the set-up and the volumes of I/O (you can capture these metrics from cloud watch and make sure the trend is not going upward).
In this example, we modify all instances – but this is not advisable. It’s better to start small, with a single workload, and measure the impact as well as understand what service disruption is caused by the downsizing. Based on our analysis, Aaron Credit History believe that the right size for their workloads is db.r4.large. By making this change, we see a significant cost reduction, from $24,993.00 per month, to $12549.00 per month. This alters annual spend from $922311.36 to $762,530.40, which represents a 17.5% reduction in TCO.
Aaron Credit History TCO – rightsized DBs
It’s important to understand that rightsizing is not a one-off exercise. Systems evolve and so does usage, so continually measuring and ensuring rightsizing throughout the life of your systems is an exercise in good cloud hygiene and could keep costs from spiralling out of control. For AWS, you can look at the AWS Compute Optimizer to help measure usage and continually find underused workloads.
When a resource is less than 40% of utilisation, it doesn’t have peaks that exceed 70% utilisation, and there is no upward trend in utilisation, so I would downsize as soon as possible. Let’s say it’s at 50% utilisation but has an upward curve, so two weeks ago it was at 45% and the week before it was 40%: This could mean the application could eventually become over-utilised, based on the upward trend. So, you should be thinking about how this resource will scale. It’s also useful to look at peaks and troughs to see whether minimal utilisation is very low, as this could be an opportunity to utilise elasticity to meet demand and scale down to avoid long periods of under-utilisation.
Let’s look at an option where IT can make all the necessary changes and the finance department can simply see a drop in monthly spend…
Rather than utilise reserved instances with an upfront cost, let’s look at three options:
- No upfront reserved
Here is the TCO calculator where we’ve applied the RDS rightsizing but where we’re using on-demand EC2 with a monthly spend of $63,544.20:
- 100 x t3.xlarge instances, where the set-up is two instances per service with 1 ALB or 50 services.
- The same goes for the 100 m4.large instances, so 50 services (2 x m4.large instances) and 50 ALBs.
So, let’s say our analysis found that 25 services, or 25% of our services, were under-utilised, 20 m4.medium instances and 30 t3.xlarge, and let’s say the options are to move m4.large to t3.medium and t3.xlarge to t3.large. Let’s use t3.xlarge and m4.medium no upfront cost one-year reserved instances. The monthly spend is now $54,046.20 (without these changes, it was $63,538.95, which is a reduction in monthly spend of $9,492.75). The annual spend would be $648,554.40, down from $762,652.80, which is an annual saving of $114,098.40 – or a 15% reduction in TCO.
Saving costs through elasticity
Elasticity is another means of saving costs. Instead of two t2.xlarge instances, we could have three t2.large instances, where two are active and the other is only introduced when there is a spike in traffic…
Let’s say we found that 10 services, which utilise t2.xlarge greatly in peak times and underutilise in times of low traffic: In terms of moving t2.xlarge instances to two x t2.large instances, but when utilisation exceeds 70%, we spin-up a third instance, which roughly accounts for six hours of the day: the monthly spend would then be $54,046.20, compared to $52,373.61, a saving of $1,672.59 per month, or an annual saving of $20,071.08 – or a 3% reduction in TCO. When you factor in all changes, this represents an approximate 31.9% reduction in TCO, from an annual spend of $922,331.20 to 628,483.32, with annual savings of $293,847.88.
As you’ll note, it’s important to understand your utilisation as well as how it’s evolving: not all right-sizing exercises will result in a cost savings. When utilisation is trending upwards, this will cause an increase in spend as your services grow in demand and hopefully in revenue too.
ARM chipsets for EC2 and RDS
New instance types have been introduced with AWS Graviton2, which uses ARM-based microprocessors. The savings can be as high as 30% but it all depends on the instance types you’re using (if they’re small, there’s unlikely to be an equivalent graviton instance type). The other issue is if you change any of your ec2 instances, you’ll need to update the OS and the configuration and ensure the applications on this instance support ARM-based chipsets. There’s an easy win (RDS) you can upgrade without having to make any configuration changes.
Converting all Aaron Credit History RDS instances from db.r4.large to its equivalent gravitron2 db.r6g.large, sees a reduction in monthly RDS spend from of $12,549.00 to $11,231.50 – giving monthly savings of $1,317.50 or 2.5% of TCO. The overall spend now is $628,483.32, from $922,331.36, which is a 32% reduction in TCO (see calculator for TCO details).
So, in conclusion, what’s the best way to reduce costs on the cloud, whilst guarding against risk? In short, companies should set up a rightsizing strategy – something that can be checked monthly to ensure that resources are not potentially under or over-utilised. TCO can be reduced also through reserved instances and elasticity. It’s important to understand utilisation as well as evolution. Look at how resources will scale and keep an eye on potential peaks and troughs.
Together with the above, the tools below will help you to achieve cost optimisation and simplify cloud financial management:
- AWS CloudTrail
- AWS Compute Optimizer
- Log aggregation tools like DataDog, ELK Stack etc
Talk to us about cloud cost optimisation
Our team of experts are here to help accelerate your cloud cost optimisation, get in touch today.