Philip Williams
on 25 July 2024
The choices you make around IT infrastructure have great impact for both business cost and performance, across areas as diverse as operations, finance, data analysis and marketing. Given the importance of data across all of these areas and frankly, across your business as a whole, making the right decision when choosing a new storage system is critical. In this blog post we will take a look at some of the factors to consider in order to ensure you balance cost effectiveness with performance.
Performance
There are multiple dimensions to storage performance, first let’s consider the most simplistic metrics:
IOPs – Input/Output operations per Second, i.e. the number of operations that can be processed in a one second period.
Response time – The time taken for an IO operation to be processed and safely stored in a storage system and an acknowledgement sent to the requesting application.
Bandwidth – The measure of the volume of data that can be transferred in a single second.
Things can become more complex when we start to consider the size of each IO. The size of each IO has an effect on the total amount of bandwidth utilised. Transferring 4KB takes less time than transferring 1MB, and therefore impacts the response time of an IO request. Let’s look at two examples
Databases
A database will typically use small IO sizes, each operation typically only updating a database table, therefore the total amount of bandwidth utilised will be low. However, the response time is critically important for this use case, as the faster the database receives an acknowledgment that data has been safely written, then the faster the next transaction can be processed.
Streaming
When editing multiple 4k video streams, a video editing application needs access to all of that data in totality, so here the response time of each IO request is less important than just transferring the entire video file as fast as possible, by utilising all available bandwidth to the storage system.
Scalability
All organisations face the prospect of data growth at some point in their existence. Across the world, exabytes of new data is created everyday, and while very few organisations have to deal with data growing on that kind of scale, their storage system should be expandable, without disruption to existing workloads.
In some systems this is achieved by adding more and more disk shelves (scale up), which allows your capacity to grow, but this doesn’t add additional performance to the controllers of the system. In more modern scale out storage systems, when adding additional capacity you also add additional compute, so you get the best of both worlds: more capacity and more performance!
Reliability
The primary purpose of a storage system is to safely store data. If an application cannot consistently retrieve data, then the storage system is next to useless. To protect data, modern storage systems use technologies like mirroring, parity or erasure coding to ensure that the loss of a disk or SSD doesn’t cause data loss. Storage systems also have multiple controllers and multiple client connections to ensure high availability, should any of those components fail. Scale out storage systems can provide even greater reliability as the software components that make up the cluster are distributed across many nodes, which allows the cluster to survive multiple hardware failures.
Flexibility
A storage system must be able to accommodate many different workloads, each with their own requirements. Some of them might be high performance, whilst others might be archival, however having the ability to migrate data between these different classes of storage pool is important as that can free up expensive fast storage for other applications.
While linked to capacity growth and consumption, having the ability to scale from a small storage system to a large one, with compromising performance is very important. Having to migrate data is always a challenge and can be the cause of application outages just to move to a larger capacity storage system should be a problem of the past!
It is also important to be able to shrink a cluster if an organisation no longer requires the total available capacity of the storage system. This is where scale out systems have an additional advantage over proprietary scale up systems, as they are built from general purpose hardware, which can be reused for other applications as necessary.
Featureset
When comparing multiple solutions, it is important to focus on the features that are important to you. Which protocols (block, file or object) do you need to use, and can the system support all of them? Do you need local replication like snaps and clones? And if yes, how many of these are the systems capable of creating and managing? Do you expect to need remote replication, or compliance features like data at rest encryption or object versioning?
Working with application owners in your organisation can help narrow down which features are really important versus choosing a solution based on hero-numbers or extreme limits sometimes shared by vendors.
Cost efficiency
In each of these areas, it’s possible to make decisions that cause the cost of the storage system to increase,which is why it is important to match the needs of a use case to the capabilities of the system. For example, we could build a storage system with all flash disks, but is that necessary for archival class storage that is accessed infrequently? Similarly, when thinking about the available features, do you need remote-replication, and is there an extra licence cost for that feature?
Being pragmatic and understanding the things you don’t need is just as important as understanding those that you do!
Open source options for Enterprise Storage
Balancing all of these needs – performance, scalability, flexibility and cost can require compromise and a solid understanding of what you’re looking to achieve across these areas.
Proprietary storage arrays often entail significant costs, paid upfront, for both support and future upgrades, and in some cases upgrades can be difficult and time consuming, especially if you have to migrate from a smaller system in order to expand further. Public cloud solutions are cheap and flexible to begin with, but once you have significant amounts of data stored, it is no longer the most economically efficient approach, (if you’re interested in going into more detail on this, read our white paper on that subject here!).
Open source storage systems such as Ceph are ready for enterprise deployment, and can provide an economically advantageous answer to all of the needs described in this blog post. Canonical Ceph is a storage solution for all scales and all workloads, from the edge to large scale enterprise-wide deployments, and for all protocols (block, file and object).
Diverse use cases with different performance, capacity, and protocol needs can all be managed by a single scale out cluster. Ceph’s ability to scale horizontally using commodity hardware means that growth can be incremental, and tuned to meet either performance or capacity needs.
Learn more
Download our whitepaper – Software-defined storage for enterprises, to learn more about:
- The budget challenges businesses face when scaling their storage
- How open-source software-defined storage provides a viable alternative to legacy appliance-based storage systems
- How to use Ceph to future proof for:
- Reliability
- Scalability
- Flexibility
- How to scale for growth, but maintain cost efficiency
- How to reduce data silos with consolidation into a single multi-protocol storage cluster
- How to prepare for disaster situations with local and remote data replication
- How managed services can provide an “as-a-service experience” while reducing cost