Database definition

Data Compression Definition: Description, Benefits, and Considerations

This is part of Solutions Review’s Premium Content Series, a collection of reviews written by industry experts in maturing software categories. In this presentation, ScaleFlux Co-Founder and Chief Scientist Tong Zhang provides a definition of data compression and an overview of the pros, cons, and potential barriers.

SR premiumThe importance of data compression cannot be overstated: the time for enterprises to adopt modern data compression technology by default was yesterday.

Data growth has gone from an interesting curiosity to an alarming and unstoppable force of nature. We produce data faster than we can process it. It’s a disturbing notion to know that there’s tremendous value in extracting data, but it’s becoming increasingly difficult and expensive to do so.

This problem is accelerating faster than technology can keep up, and with the 5G wave just a few years away, the pressure is on. Businesses are scrambling to cope and are frustrated by the complexity of the problem, but there is an obvious solution for all to see. It’s time to take a hard look at data compression and the assumptions we have about its costs and benefits.

Data compression has been around for a long time and is commonly understood as modifying data in such a way that it takes up less storage space. There is a common myth that compressing data means speed and performance are compromised, but that couldn’t be further from the truth. Many assumptions about data have evolved in recent years: how we generate, store and value it, and data compression is no exception. Data compression is very different today than it was ten years ago. To remain competitive, we must therefore adopt a modern mindset.

It’s time to explore what types of data are compressible, how data compression has changed, and the benefits of adopting this strategy. Let’s bust some of the most common myths about data compression.

What is compressible and what is not?

With the exception of multimedia data (eg video and images) and encrypted data, virtually all other types of data are compressible. In fact, it’s safe to say that a company’s most valuable data is inherently compressible. This includes transactional data, sensing data, IIoT (Industrial Internet of Things) log, and messaging and streaming data. These types of textual data must be processed in real time so that users can extract their true value from a business perspective. And for companies that use machine learning and artificial intelligence models, this data is crucial for training neural models.

Currently, the relative percentage of compressible data remains unchanged, but the volume of data generated increases exponentially given the explosion of the digital universe. So the question becomes: how can companies store, process and make sense of all this data in order to achieve better business results?

New Ways to Compress

There is a common misconception that compressing data leads to reduced speed and performance. This is because, by nature, compression algorithms are poorly suited to modern CPU (central processing unit) architecture. No matter how well engineered the architecture of processors, compression algorithms still encounter performance issues when running on the processor.

Fortunately, there is a way around this problem. By removing processor compression, businesses can reap many benefits (more on that later), including dramatically increased speed and performance. But where should the compression happen if not on the CPU? There are many ways to go about it – and there’s no one-size-fits-all solution – but SSDs (solid-state drives) with built-in transparent compression are a great way to overcome speed and performance hurdles while keeping costs down. of storage.

A look at the benefits

Speaking of cost, data compression is the most efficient and easiest to deploy way to reduce the overall cost of data storage. Other cost reduction techniques, such as deduplication, are much more complicated to deploy and maintain, and can result in a marked degradation in speed performance. In addition to reducing storage costs, data compression can also have a positive impact on performance and reduce latency.

Let’s look at a few scenarios where data compression can provide significant benefits.

1. Relational databases (eg, MySQL, PostgreSQL, Oracle, SQL Server). It is well known that relational databases contain highly compressible data. However, relational database users rarely use CPUs to compress this data due to its impact on speed performance. By deploying a solution that transparently compresses relational database data without any CPU overhead, users can experience a reduction in storage costs of more than 50% and a performance improvement of more than 2 times the speed.

2. Latency-critical key-value storage (eg Aerospike, CacheLib). Due to their latency-critical nature and inherent data structure, key-value stores like Aerospike (which is widely used in latency-critical systems like finance) cannot achieve compression alone. based on the processor, despite their highly compressible data. By deploying a solution that transparently compresses key-value store data without CPU overhead, users can experience savings of over 50% on storage costs and a reduction of over 5x the tail latency.

3. Data delivery platform (eg Apache Kafka). Widely deployed in modern IT infrastructures, data streaming platforms such as Kafka consume a significant amount of storage and networking resources. Most of the data streams are highly compressible, but the high-speed data stream makes it difficult to use CPUs for data compression. By transparently compressing streaming data, users can experience storage cost savings of over 50% in addition to networking cost savings.

Is sustainable data growth possible?

Technologies like 5G are on the horizon, and businesses are about to be hit by a tidal wave of data that will push their storage beyond its physical limits. Businesses can no longer rely on their storage solutions to scale forever. It’s time to consider solutions that reduce the data footprint and bring total storage costs under control before the next data explosion takes place. Companies that turn to data compression to proactively solve this problem will come out on top while reaping the benefits of increased speed and performance.

Tong Zhang
Latest posts by Tong Zhang (see everything)