Linux compression is an incredibly useful tool for file storage, archiving, and transferring information within your computer. It reduces the amount of storage space required, improves bandwidth, and can even save you money by reducing the likelihood of you needing to buy further storage.
But when it comes to actually using compression software for Linux, it’s all about compromise. You can either have fast compression, thorough compression, or something in between – you can compress with reduced quality or you can sacrifice a bit on the overall degree of compression. Either way, you can't have all three. So, what are the differences between these compression types.
Using lossy or lossless compression
There are two types of standard compression software for Linux, lossy and lossless.
Lossy compression works by removing the least important information from a file in order to shrink its file size. For example, when compressing a JPEG image using lossy compression, you’d expect to see a loss of image quality as the name suggests. In a high-resolution image, lossy compression finds multiple colours that are very similar in shade and replaces them with a single combined colour. This reduces the number of different colours within the compressed image, but it comes with a loss of quality.
But, the reduction in quality can be lessened or increased depending on how compressed the user wants the file to be. An increased degree of compression will result in a smaller file size, but will give you an image that's more reduced in quality. Comparatively, less compression will give a better quality image but a larger file size.
Regardless of how much you compress a file, after the file is compressed it can't be returned to its original quality when decompressed.
Intuitively, lossless compression combats this by compressing files without a loss of quality. Lossless compression works by replacing certain information with a placeholder and the compression utility then uses algorithms to recognise repeats of this information. It then uses this info almost like instructions on how to recreate the file so that when the compressed file is decompressed, it returns to the exact state that it was in previously.
For example, an image might be compressed to r2w4b3 to represent two red pixels, followed by four white pixels, followed by three blue pixels. Alternatively, a document titled as a string of text like ‘hhhhhaaaaaaaaa’ might be compressed to h5a9. These shorthand algorithms compress the file, but because no information is actually lost, it ensures that it can be restored to its original state after decompressing.
In general, lossless is preferred over lossy because there is no reduction in file quality, but because lossy compression is actually removing information, it usually results in a smaller compressed file size.
Comparing Linux compression tools – fast or thorough?
Much like lossy and lossless compression, when choosing between compression tools to compress files with, a compromise is also necessary – it links back to our previous choice between fast compression, thorough compression, or somewhere in-between.
Compression can be quick and convenient, but the file size will not be reduced drastically. In order to greatly reduce file size, you'll need more processing time meaning compression will be slower.
Because of these required compromises, there is no single group of compression tools that will be the perfect choice. Instead, there are a wide range of tools that each have their own strengths and weaknesses. All of these tools are available on the Linux operating system and are good examples of lossless compression.
gzip is possibly the ‘original’ and most well-known of all Linux compression tools, and with good reason. gzip uses very little system memory, and as such, offers very quick compression and decompression of files when compared to other compression tools.
Although, as we’ve learnt, this quick compression comes with a compromise. Files are compressed less thoroughly with gzip than with other tools, which leads to comparatively larger file sizes. This compression rate/compression speed ratio can be adjusted to offer more thorough compression, but again, speed will be compromised.
One benefit of gzip though is its compatibility. Because it’s been around since 1992, almost all Linux systems are compatible with the tool.
Unfortunately, gzip has a second distinct disadvantage. With gzip, only one file can be compressed at a time, whereas other tools can compress entire folders at once. As such, gzip is probably only the best choice when a single file needs compressing quickly and at a lesser rate of compression – gzip is not a logical option for larger-scale use cases where a more thorough compression is needed.
7-Zip compression, on the other hand, is a better option for larger-scale needs because it offers compression of entire file directories and folders in one go. 7-Zip also provides impressively thorough compression and can reduce file sizes significantly with no loss of quality. But this high compression rate demands significant system performance to support it, which can result in a slow compression rate overall.
The 7-Zip compression utility does perform well in decompression though. Even thoroughly compressed files can be decompressed quickly on the other end. This has benefits in use cases like the distributions of apps and software. The developer can significantly compress their app to a small file size with 7-Zip compression and then upload it to be downloaded.
While this compression might take a while, it's incredibly thorough and the download on the user’s side is very quick. Assuming the developer has a powerful machine, the high system performance requirements shouldn’t be too much of an issue.
bzip2 compression falls in the space somewhere between gzip and 7-Zip compression. Its compression rate is more thorough than gzip, but less so than 7-Zip, and its compression speed is faster than 7-Zip, but not as fast as gzip. bzip2 compression also results in smaller file sizes than gzip, but this compression takes about four times as long.
Like gzip, bzip2 compression also has the limitation of only being able to compress single files at one time. This makes it unsuited for the archiving of large file directories, but more suited to single-file compression where a smaller file size is required. Again, this thorough compression requires more system/memory resources than gzip but the results are better.
A more modern solution to the compression problem experienced by Linux compression tools, lbzip2 looks to solve the system requirement drawbacks of Bzip2 compression by spreading the compression process over multiple cores.
The older, more traditional compression methods like bzip2, 7-Zip, and gzip all compress files through the use of a single core, regardless of the number of cores a CPU has. Modern computers, however, can now have from 2 cores to over sixteen cores. So by spreading the compression over multiple cores and then stitching them all together at the end, lbzip2 can compress files quicker with less drain on system resources.
Essentially, lbzip2 is using the same compression algorithm as bzip2 compression, resulting in the same compressed file size but at a much faster rate.
There are many more compression tools available for Linux, such as xz, lzop, and p7zip, but they all follow the same basic trend – the smaller the compressed file size, the slower the compression. But as we've seen, technology like that found in lbzip2 aims to buck this trend by adopting modern, multi-core techniques for compression.
While we can't directly help you with your compression conundrums, here at Fasthosts, we're more than capable of providing you with plenty of Cloud Server space for all your file and backup needs. Not only that, but we also provide Web Hosting and Dedicated Servers so whatever your project's requirements, we've got you covered.
You can discover even more information on Linux, compression, and other software tools by visiting the Fasthosts blog. We've plenty of detailed articles and guides which can help you enhance your computer know-how.