I had dinner with Mr. Backup (Curtis Preston) the week before last at Legal Seafood in Boston in conjunction with the BD Event and took the opportunity to ask Curtis about several of his posts that emphasized the importance of deduplication ratios.

To summarize the various posts and cross posts, the issues on the table are 1) the quantity of disk to manage (monitor, replace on failure, etc.), power, and cool and 2) the effect on replication and bandwidth.

Both points are well taken, but there is one additional point to consider: The most relevant metric to use in these discussions isn’t solely the deduplication ratio – it’s the “data reduction ratio”. The data reduction ratio embraces the deduplication ratio but – most importantly – it recognizes the data compression ratio as a further component of data reduction.

Let me explain using block-level deduplication as the basis of this discussion. If a 100MB file (consisting of 16,384 unique 64KB blocks) is stored 5 times in 5 different directories (or with 5 different file names), a block deduplication system will reduce 16,384 x 5 blocks (81,920 blocks) to just 16,384 blocks (the blocks are assumed to be unique within the file so there is no deduplication among the 16,384 blocks themselves). Hence, 500MB of file data deduplicates down to 100MB of data blocks with a resulting data deduplication ratio of 5:1.

But what happens if those same 16,384 blocks are compressed at a rate of 2:1 at the same time they are being deduplicated but before they are stored on disk? Well, the amount of data actually stored becomes ½ x 100MB or just 50MB. And 50MB stored that represents 500MB of non-deduplicated and uncompressed data computes to a data reduction ratio of 10:1. So the total data reduction ratio becomes the deduplication ratio multiplied by the compression ratio making compression a data reduction effectiveness multiplier.

So if the data reduction ratio is, indeed, the operative metric, then compressing blocks as they are deduplicated should be an essential component of a comprehensive data reduction system.

Exar’s newly-announced BitWackr C Series adds hardware compression (think compression at the tape drive level) to deduplicating backup to disk software from Symantec, CA Technologies, BakBone or CommVault. These are great backup products made even greater through the addition of hardware compression.

A description of the BitWackr C Series is available here.


Written by BitWackr

June 29, 2010 at 12:01 pm

