Migrating from on-premises IT infrastructure to cloud services is a current trend in the industry.
Why maintain internal, physical servers when you can outsource them to the cloud, right?
Additionally, the cloud offers solutions with a high level of security, scalability, and effectiveness.
It all sounds great, but then there’s reality: you migrate to the cloud, but the effects are weak or even non-existent. What happened?
Many companies approach migration as an easy lift-and-shift process, transferring all of their code and infrastructure as-is to the cloud. While initial migration costs may be lower, running non-optimized applications in the cloud can result in higher ongoing operational costs compared to optimized cloud-native applications.
Another approach is re-architecting the applications to better leverage cloud-native features and services.
This approach seems very good but still may lead to high costs and low effectiveness because cloud services operate under different principles and require a tailored approach.
This is particularly important when using serverless services.
The data modeling methods we are accustomed to with on-premises solutions are often counterintuitive when applied to such services. Many developers find it challenging at first to adapt to this new way of thinking about infrastructure.
In this article, we will present approaches and methods that allow us to predict and control cloud costs associated with database usage.
Before the cloud, it was pretty easy and straightforward: the traditional app would start once and then operate in the background constantly, always needing server resources regardless of the actual web traffic.
Now, there are Cloud Native Solutions designed specifically for operating in the cloud. For example, AWS Lambda is a micro-function that activates server resources only when needed.
As a result, we use the processor and memory only when necessary and pay only for what we use. After the action is done, the cloud resources are released for someone else to use.
As you can see, this is much more effective globally – instead of thousands of private servers, we can all use the same virtual ones.
It works similarly with DynamoDB. AWS handles all aspects of server management, including hardware provisioning, software installation, configuration, patching, and backups. Users interact with DynamoDB through APIs and the AWS Management Console without needing to manage any servers. And you pay-as-you-go.
However, to make it effective locally, just for your business, you need to understand how it works, get acquainted with the functions your provider delivers, and learn how to design and build applications that will use these functions to your advantage.
Amazon cloud services will serve as an example. AWS DynamoDB is a comprehensive database service that includes maintenance, scaling, updates, and security – a full package, ready to use without the need to physically install any software.
What’s important is that calculating payment for this cloud service is specific.
In general, serverless computing is a method of providing backend services on an as-used basis. In the case of DynamoDB it takes mostly read and write operations into account.
As a whole, DynamoDB charges for three things:
This approach to billing is fundamental to the new way of thinking about the software architecture you need to adopt to optimize for the cloud. It’s worth pointing out that read and write consumption constitutes the majority of the final costs, so they are the main factors we need to consider.
A common mistake is assuming that serverless cloud services are always cheaper because you only pay as you go. This is not true. This model of infrastructure is governed by its own principles (described above) which dictate the usefulness of cloud services in individual cases.
The first databases to appear on the market (SQL) are suited for data analysis, helping to understand the context and relationships within the data better and extract meaningful information. However, they have a significant flaw: complicated scaling. When there are hundreds of gigabytes of data or significant web traffic, they don’t run smoothly anymore.
On the other hand, the NoSQL database (e.g., AWS DynamoDB) is a poor choice for data analysis but excels at OLTP (Online Transaction Processing), making it ideal for FinTech applications. If there are a lot of transactions and users, a NoSQL database will shine. It will enable constant read and write times regardless of the amount of data, but it has to be modeled correctly for this type of database (this last part is very important!).
However, there is one more important caveat: to model the data correctly in a NoSQL database, we need to define access patterns. It’s not a suitable solution for applications with many unpredictable access patterns, dynamic queries, numerous filters, or potential significant changes in code in the future.
The first step of optimization is to choose the right tools.
The standard approach to data modeling for SQL databases aims to minimize storage, i.e., the quantity of saved data. Efforts were made to write every piece of data just once and create separate relational tables for each type of information. Extracting the data could be computationally complex and time-consuming, but it didn’t matter much.
This approach originated from the historical context where drives were expensive, and computing power was relatively cheap. Today, the situation is practically reversed – memory is more affordable, but computing power scales less effectively, making it more costly in the long run.
This change has led to a common mistake among developers: they often try to implement the same data modeling strategies in modern applications. Unfortunately, it’s impossible to build very efficient methods of extracting or storing information the same way today on a large scale. We need different styles of thinking and different skill sets.
Single Table Design is a solution introduced to address this challenge. In AWS DynamoDB, instead of creating separate relational tables, you create one summary table for a service, if possible, and model the data according to access patterns.
As a result, the number of queries is significantly reduced because the information needed together is saved jointly in one partition, allowing the query to retrieve the entire set of needed data.
This approach requires a thorough knowledge of the application’s usage and access patterns, so it is usually implemented later as an optimization rather than in the initial design.
The specific payment methods for cloud services are also an important area for optimization. At the beginning of this article, we explained RCU and WCU and how they work. Now, how do we optimize for them?
A thorough knowledge of access patterns allows us to predict the costs of a query. If we also know the average number of these queries, we can easily calculate the costs.
These calculations should be the basis for decision-making. Any approach to development, any changes, or business decisions about the app should be confronted with an analysis of related costs.
Conversely, we can analyze existing solutions or different options in terms of the costs they generate and look for better, more cost-efficient approaches.
In short, cost forecasting should always be an important factor in any decision-making regarding cloud services.
One important issue emerges from this whole picture: it’s worth investing in training or hiring a specialist with experience in this technology.
This technology is so different from the standard approach and so unique that even experienced developers can encounter serious problems or make costly mistakes.
Your team should include at least one specialist with experience in optimizing databases for cloud services to avoid wasting resources, time, and money.