home products company partners clients news careers contact us
lossless compression fpga asic core

Data Compression
background information

Data Compression - a short backgrounder

Data compression is the process of encoding information using fewer bits than the original. Compression techniques can be used to increase the quantity of data which may be stored on media which otherwise might be too restricted in capacity. Similarly, network speed may be apparently increased using compression to reduce the quantity of data being transferred.

Some schemes are completely reversible so that the original data can be accurately reconstructed (lossless data compression), while others accept some loss of data in order to achieve higher compression (lossy data compression). The latter category is appropriate for audio and video compression (eg. MP3 and MPEG), but here we consider applications where compression must be lossless.

Compression algorithms may be designed to suit exactly any known characteristics of the data, but what is often needed is a general purpose algorithm which extracts the redundancy from the repetitive nature of certain computer files, especially those containing program code and other humanly-readable text.

Lempel and Ziv

The Lempel-Ziv compression technique developed in 1977 is the basis for many popular algorithms for lossless compression, and many of its derivatives are generally free from patent claims.

The LZ77 methods work by identifying repeated strings, looking back (never forward) into a sliding window of history of the data. The potential set of strings to be matched against is called the dictionary, which is populated as the data arrives at the encoder. Different implementations may put more or less effort into identifying such patterns, resulting in many distinct compressed representations which expand back to the same original data.

An important parameter of the LZ77 algorithm is the amount of history which is available to search. This is called the history depth: the size of the sliding window looking backward from the current position. The larger the history, the better the compression performance, with ever decreasing returns. Where possible, data should be processed in as large a block as possible, and with as large a history as possible, remembering that both the compressor and decompressor must provide memory for this history.

The LZ77 methods differ mostly in how they encode the matched information. For example the popular DEFLATE variant (which forms the basis of many commonly used "zip" utilities, and the Zlib library) combines an enhanced LZ77 string matcher with statically or dynamically chosen Huffmann codes for optimal performance. The LZMA variant used in 7-Zip follows the LZ77 stage with Markov chain and range encoders. Such state-of-the-art algorithms used on desktop computers require significant memory resources and CPU power to achieve good results, and are not always practical for hardware implementation.

LZRW Algorithms

One particular family of LZ77 derivatives is LZRW. These were developed by Ross Williams around 1990 and are particularly amenable to high throughput hardware implementation. Both the earlier LZRW1-A and the later LZRW3 use the same hash table lookups to identify matches, and so provide (almost) identical compression. However, LZRW1 transmits the match information as offset/length pairs, whereas LZRW3 transmits hash/length pairs, allowing greater history depths to be referenced at the expense of maintaining an identical hash table at the decompressor.

Some small but important modifications are made to the original algorithms which allow better hardware implementation. For both LZRW1 and LZRW3, a proper sliding window (rolling history) can be used for longer blocksizes without increasing the RAM utilisation significantly. This will be appropriate for customers moving to larger frame sizes, or for storage applications which use much larger data blocks.

Typical Performance

The compression ratio (defined here as compressed length as a percentage of the original length) is obviously highly dependent on the characteristics of the data itself. The table below shows some average results for test files taken from the standard Calgary and Canterbury datasets often used for comparison. The three files chosen represent different corners of the possible data space; the "easy" file is a bitmap which compresses well, the "text" file contains much humanly-readable redundancy like most email and HTML pages, and the "hard" file contains database numbers with only a small amount of repetition.

For any given dataset, the compression ratio improves as the history increases, but it is clear that the primary factor is the data itself. Note that the compression ratio on standalone short frames will never be that impressive, as there is simply not enough redundancy to extract!

LZRW3 HISTORY DEPTH (Bytes)
DATASET (filename) 2K1 4K 8K2 16K2
Easy (canterbury/ptt5) 26.0% 25.2% 24.8% 24.5%
Text (canterbury/asyoulik.txt) 74.9% 70.0% 66.0% 62.7%
Hard (calgary/geo) 92.7% 89.1% 85.8% 83.7%
Canterbury Average (11 files) 50.7% 48.1% 45.8% 43.7%

1. LZRW1 performance for 2K is marginally better than LZRW3.
2. History depths >4K bytes are not supported by LZRW1.

For area and throughput metrics in popular FPGA technologies, please see the LZRW Compression Core and Compression System product pages.

Compression and Encryption

Data compression and encryption are often closely related within a system. It is worth noting that data which has been encrypted will not compress effectively due to the way in which encryption naturally removes any repeats or patterns from the plain data. This means that when these two operations co-exist, the order in which they are applied is very important, as is a full understanding of the relative data throughputs through the compression function. Significant care must be taken in the design of such a system to ensure the most effective use of the hardware.

Contact

For more detailed information on this or any of our other products and services, please feel free to email us at helioncores@heliontech.com and we will be pleased to discuss how we can assist with your individual requirements.


Copyright © Helion Technology Licensing Ltd, 1998-2024. All rights reserved. Privacy and Cookies
Web Site Developed by Goldstag Limited