Marple | Resources | How to keep the costs of time series databases under control

→

How to keep the costs of time series databases under control

Technical

•

Author:

Isabel Munoz

A lot of engineering and R&D teams are switching to databases to store test and measurement data. In our conversation with Atlas Copco, we dove into the advantages this approach brings to your organisation.

However, as data volumes grow, so do the costs associated with running a database. This blog post will explore various strategies to control its costs without compromising on performance or functionality.

‍

A lot of engineering and R&D teams are switching to databases to store test and measurement data.
In our conversation with Atlas Copco, we dove into the advantages this approach brings to your organisation.

Five strategies to control database costs

1. Optimise how you query

A big part of the cost of a database is often the computation power: how much RAM and CPU is assigned to the database instance. You might be using a heavy instance currently to run heavy queries.
There are a lot of strategies you can apply to lighten the load these queries have on your database:

Make use of indexing (on one or multiple columns)
Make pre-calculated views (e.g. materialised views in Postgres)
Use application-side caching to prevent some queries from hitting the database
Limit the amount of data you pull out (e.g. subsample/aggregate at the end of the query)
Segment the data into multiple tables for faster retrieval

By focusing on these optimisations, you can often decrease hardware resources while maintaining a similar performance.
If you want to start optimising this, always start by exploring event logs or monitoring tools on your database. They can tell you the most expensive queries, which will give you the most gains.

‍

2. Virtual machines are not that bad

Managed services such as AWS RDS, Azure SQL databases or InfluxDB Cloud are convenient & fast to set up. They come with batteries included, such as automated backups, performance monitoring and auto-scaling.

Nevertheless, these services come with a premium price tag. And they have a cheaper alternative: using bare virtual machines. On AWS, for example, managed databases are roughly 2x as expensive as a VM with the same specifications:

Screenshots from https://instances.vantage.sh

‍

3. Pool resources between applications

Instead of maintaining separate database instances for each application, consider pooling resources. Postgres, for example, like most databases, supports having multiple databases inside one instance.

One of our customers is using InfluxDB across multiple projects. They have months when they are more heavily testing for one project or the other. By putting all this data into a single InfluxDB database but organised into different buckets, they are only paying once, for instance.

Two pitfalls should be avoided when implementing this:

Make sure to implement proper security measures to ensure data isolation between applications or projects
Monitor resource usage to prevent one application from impacting others

‍
‍

4. Implement a two-tiered approach

Not all data needs to be instantly accessible. A two-tiered approach can effectively manage costs by strategically splitting your data into two parts. This method involves keeping frequently accessed (hot) data in the primary database for quick retrieval while moving less frequently accessed (cold) data to more cost-effective storage solutions, such as an S3 store.

To make this work in practice, it's crucial to implement a system that provides a joint API over these two storage systems. This can be done by writing a thin API or by leaning on tools such as Spark, Microsoft Fabric or Databricks.

‍

5. Consider non-traditional data stores

A traditional database might not always be the most cost-effective solution. If you have a weird data shape or non-typical requirements for your applications, it might be interesting to explore things like NoSQL databases or graph databases like Neo4j for highly interconnected data structures.

In some cases, you might not need a database at all. For applications with simple data structures or those that don't require complex queries, using a file-based storage system could be a more cost-effective solution. Sqlite can be queried directly from an S3 storage, and parquet files allow for random access on mounted file storage. Or, if you push it further and don’t need to store the data, you could move towards a streaming architecture. Azure event hubs have been adopted recently by big R&D teams, with even lower latency performance than traditional databases.

A hidden catch to watch out for

The solutions above can help get the bill down of your cloud provider. But they might also introduce a new hidden cost: you might have increased the man-hours being spent.

Therefore, while implementing cost-saving measures, it's crucial to consider the total cost of ownership. This includes evaluating the manpower cost required to migrate, maintain, and optimise your new setup, as well as considering the potential impact on application performance and user experience. Additionally, factor in the cost of training your team on new technologies or approaches.

As always, a good place to start is to pick a few easy solutions that have a lot of gain.

Conclusion

In the post above, we offer five solutions to get the cost down of your databases. For each of them, we have seen some of our customers successfully implement them to reduce their cloud provider bills. Do keep in mind to start with the easy changes: advanced solutions require a large time investment by the people doing it, which might offset the cost you are trying to save.

At Marple, we specialise in the analysis of time series data. Don’t hesitate to get in touch if you have questions about the above content or want to learn more about how we can connect Marple to your database.

Products

Developers

Resources

Company