Cloud Service Selection and Cost Optimization

Abstract

Challenges Cloud computing offers great flexibility, but at the same time makes planning cloud infrastructures and their future costs difficult. A key challenge is to estimate the dynamically changing usage of cloud services. Creating a reasonable estimate for Infrastructure as a Service (IaaS) requires detailed knowledge about software services operated on this infrastructure, the corresponding products’ development within their markets, and cloud providers’ pricing models as well as the management of uncertainties in the estimate. Further, so far there is no structured approach for documenting the plans and their underlying assumptions in a transparent and traceable way.

Contribution This research project aims to address these challenges in cloud infrastructure planning and cost management by engineering a process model that provides its users with a framework for planning and cost management. Further, to ease the implementation and usage of this process model, a software prototype is developed that should support the users of the process model.

Introduction

Cloud computing providers offer many services that can be obtained almost instantaneously, granting their customers flexibility. The often advertised elasticity of cloud services, i.e., the seemingly limitless availability, is especially interesting for scalable software services. However, the possibly ever-changing usage of cloud services also means that planning costs has become much more difficult than simply multiplying a fixed cost per period with the number of periods of interest – this is especially true for mid- to long-term plans, i.e., not a single month, but many months up to years.

With flexibly used cloud computing, planning costs requires knowledge about the development of the cloud service usage and the cloud providers’ pricing models. In the case of Infrastructure as a Service (IaaS) from the cloud, understanding the usage development requires technical knowledge about the software services deployed on the compute infrastructure and knowledge about the product the software service represents. For the knowledge about the software services, the infrastructure requirements, the service’s network communication, and their scaling behavior must be known. For the knowledge about the product, expectations about the development of the product within its market are required. Further knowledge is required how the product development, e.g., a rising number of users, impacts the corresponding software services’ workload. With a plan of the required infrastructure, the next step to calculate costs would be to apply the sometimes complex pricing models. Thus, planning costs is a tedious task that involves broad knowledge that often lies not within a single employee of an organization but requires collaboration that needs to be organized.

Planning future costs comes with uncertainties. Every plan is based on assumptions that can prove to be wrong. For example, the product may be adopted much more or less leading to a higher or lower compute infrastructure demand; similarly, the introduction of new features may change the software services’ performance and thus impact the workload. The plan can also become outdated when new software services are added or others become obsolete. Thus, the actual and the planned infrastructure may deviate and with this the cost may develop differently than expected.

Since there are many possible causes for the deviation between actual and planned infrastructure, identifying the actual causes is not trivial. Yet, knowing what caused a deviation may be important to respond reasonably. Further, knowing the cause could allow to learn from mistakes and false assumptions made in the past and thereby allow to improve future plans.

Research questions

In this research project, we address the research questions:

What is an effective method to select cloud computing services and to estimate their future costs while minimizing and easing the manual labor involved, supporting collaboration, and considering uncertainties?
What is an effective method for documenting estimates of future cloud computing costs transparently, e.g., to allow tracing false estimates to mistakes or erroneous assumptions?

Solution approach

As a solution to the research questions, we are engineering a process model for planning and monitoring cloud computing costs with a focus on Infrastructure as a Service (IaaS). Users of this process model are intended to be stakeholders involved in the management of cloud computing costs. For example, this can be product owners and employees dedicated to cloud infrastructure management. The process model is composed of four steps as depicted in the figure below. The intention is to develop prototype software tools to assist the users of the process model in implementing it and to organize collaboration among them. Subsequently, we describe each of the four steps.

Process model for planning and monitoring compute infrastructure costs

Step 1: Plan infrastructure

The process begins with the planning of a compute infrastructure for a user-defined composition of software services. This allows considering anything from a single web service to a large organization’s service landscape.

The planning begins with the definition of requirements that the software services have regarding the compute infrastructure. The requirements are modelled using a domain-specific language designed for modelling the requirements. The language allows to express the need for infrastructure services, e.g., virtual machines, block storage, or load balancers. It also allows to specify the attributes the needed infrastructure services must have, e.g., a virtual machine’s number of CPUs, amount of memory, storage, and GPU acceleration. The language further allows to express network links and the expected traffic on these links.

A key feature is the definition of the expected workload development to allow planning for a longer period. The workload development is the expectation for how the considered software services will scale in the mid- to long-term future, i.e., not only within a month but possibly up to a few years. The idea behind this is to incorporate expectations regarding the future developments of software services into the planning. For example, if we expect to have a web service that will see doubling numbers of users every six months for the next two years, a static planning would soon diverge drastically from such an exponential development. One issue with mid- to long-term expectations is growing uncertainty over time. Given the example with the increasing number of users: what if the assumption was wrong in the way that number of users does not double every six but only every twelve months?

To consider uncertainties, the process model includes the possibility that its users model multiple scenarios. Each scenario can represent one possible future development and contain the correspondingly modelled requirements and workload development. This mechanism grants users the flexibility to either create a single scenario of their future workload estimate if the user is confident about the future or to create multiple scenarios, e.g., a best-, a most-probable-, and a worst-case scenario, if the user is less confident and wants to plan with these uncertainties.

The scenarios are used as an input to create a cost comparison for the scenarios. The costs for each scenario are determined by an optimization that finds the best matching infrastructure services for the modelled requirements. For example, it will determine the virtual machines that match the given properties with the lowest costs. While at a first glance this optimization may sound like a rather simple task there is much complexity involved. Finding a cost-optimal infrastructure setup in general is an NP-complete problem and as such assumably hard to solve. Reasons for this complexity and the pricing models of cloud providers and network dependencies.

Step 2: Implement infrastructure

With the plans set, the implementation begins. It is intentionally left open how the infrastructure is implemented as there are many different approaches and technologies regarding the implementation of compute infrastructures. For example,

for a small number of individual servers it may make sense to obtain these manually via a cloud provider’s user interface;
for a higher degree of automation, the use of Terraform may be preferred;
for larger cluster setups a managed Kubernetes may be the first choice; and
for special cases an individual infrastructure management via the cloud providers’ APIs may be implemented.

To accommodate for the many reasonable approaches, the users are free to setup the compute infrastructure in their preferred way. We assume that they will try to obtain cloud services according to what was planned before. The emphasis here is on “try” since plans cannot consider every possible development and may be prone to false assumptions. Recognizing and dealing with deviations from the plan is left to steps 3 and 4 of the process model.

Step 3: Monitor infrastructure costs

As time passes by, the implemented compute infrastructure steadily generates costs. Further, the infrastructure may change from time to time. Reasons for this can be, for example:

a scaling of the infrastructure according to the workloads as planned and modelled in the scenarios;
an unexpected scaling of the infrastructure due to an unexpected increase of how much the software services deployed on the infrastructure are used, which was not modelled in the scenarios; and
the addition of new software services to the service landscape that require their own compute infrastructure not yet considered in the planning.

To recognize deviations between the actual and planned costs it therefore is necessary to, on the one hand, monitor the infrastructure and the costs it generates and, on the other hand, compare it against the scenarios of the plan created in step 1. This step is intended to be supported by a software assistant that provides interested stakeholders with a view on the monitored actual costs in comparison to the planned costs of the one or more scenarios. This allows the users to recognize deviations. Step 4 in the process model deals with reasoning about deviations.

Step 4: Act on deviations between planned and actual costs

Should the software assistant inform stakeholder about deviations between the actual costs and the planned costs they consider significant, it may be time to act on the deviations. Like step 2, this step is by intention left open to the users of the process model. There may be many reasonable actions, for example:

Update the plans: If the situation changed in a way that was not anticipated by the plan before, the deviation may be considered reasonable. Thus, it may be necessary to update the plan to obtain a reasonable estimate of the future costs again. This means to again start the process from step 1 by creating updated scenarios for a new infrastructure plan. Optionally, implement any changes between the current and planned compute infrastructure plan in step 2. Then, enter the monitoring in step 3 again until the next deviation may occur.
Manage the compute infrastructure: If the compute infrastructure is not according to the plan before, but the deviation is considered unreasonable, then the compute infrastructure needs management. Example for such unreasonable deviations may be that compute infrastructure was allocated but not deallocated or allocated without a reasonable need. A management action could be to deallocate unnecessary compute infrastructure to return to the planned state as well as to establish new policies to prevent similar deviations in the future.
Ignore the deviation: If the deviation does not pose a problem and otherwise there is currently no need or priority to act, it may even be reasonable to just ignore it for now.

Contact

Christian Plewnia

External PhD Candidate

plewnia@swc.rwth-aachen.de

Project information

Researchers:

Christian Plewnia

Project start & end:

2020 – 2023