BitWackr Blog

Deduplication beyond backup

Hardware Compression for Disk and RAID Storage

with one comment

Data compression using dedicated hardware has been a standard feature of tape drives for years. Compression roughly doubles tape cartridge capacity while improving performance and everybody uses it. This post discusses how Exar has overcome the obstacles that have kept hardware compression from disk and RAID.

The Block Manager 

Disk and RAID storage use fixed length blocks. If a file that consumes 10 blocks is compressed and stored in just five blocks, a 2:1 reduction in storage capacity is achieved.

Applying block-level data compression to disk drives or RAID storage means that the blocks themselves have to be reduced in size so they can be stored in less physical space following compression. The critical link between RAID’s fixed length blocks and variable length compressed blocks is a new component called a “block manager”.

The block manager packages compressed blocks into shorter fixed length blocks that are stored on disk drives or on a RAID array. It is also the block manager’s responsibility to keep track of the blocks so the original data can be properly reconstituted when read.

A combination of data reduction and performance testing confirms that using pre-determined block sizes of 16KB, 32KB and 48KB to contain 64KB blocks that have been compressed by ¼, ½ or ¾ provides an excellent balance between performance and capacity reduction.  

Performance

Exar’s Express DR1600 cards can compress data at speeds up to 300 MB/second, assuring that if used with an efficient block manager, data compression can be performed in the data path without imposing a performance penalty. Indeed, since compression reduces the number of bits written to or read from disk, significant performance gains are typically realized when writing compressed data to disk.

Preserving Data Integrity

Preserving data integrity is the first and most important priority for any storage product designer, product manager or development engineer. And unlike data compression performed in software with general purpose processors, the BitWackr offers enhanced, state-of-the-art data integrity capabilities that operate throughout the data retention lifecycle.

For added data integrity in data compression and decompression the Express DR1600 cards decompress data immediately after compression to verify that the compression operation was successful. They also use a CRC to verify the decompression operation. If a decompression operation fails, the operation is retried in software to identify whether the failure was caused by the command submission, the data being corrupted or the card failing the decompression operation.

The CRCs produced by compression operations are stored with the compressed block on disk and compared to the CRCs generated when the block is decompressed, assuring end-to-end data integrity throughout the compression/storage/decompression process.

 Thin Provisioning

“Thin provisioning” simply means that the file system interprets a storage volume as having a (virtual) capacity that is in excess of its physical capacity. A thinly provisioned virtual logical unit (LUN) target is only partially backed by physical storage at the time of its creation with additional physical storage being added to the LUN as needed. The storage controller reports the capacity of a thinly provisioned LUN as its virtual capacity.

Disk data compression is a thin provisioning technique that improves storage utilization efficiency. The virtual capacity of the compressed disk target needs to be set based on a fixed assumption as to the data reduction ratio that will be achieved.  However, because the compressibility of the data stored on a compressed volume cannot be determined in advance, there can be an “over commitment level” caused by less than expected compressibility that is unknown at the time of volume creation. This presents an additional challenge to the backup-to-disk storage administrator, the most obvious being an out-of-capacity condition.

Tape avoids out-of-capacity conditions by using an “early warning” marker on the media. When the backup software detects the early warning marker, it is the signal to begin preparing for the next tape in a sequence. There is, however, no concept of an early warning marker on a disk drive or RAID volume.

The potential liability is for the backup software application to crash into the end of the disk media while writing, resulting in failure of the backup job and possible damage to the file system.

The design of Windows NTFS did not anticipate its use with LUNs that have different physical and virtual capacities.  An over-commit error condition in which the physical capacity is not available and the virtual capacity indicates otherwise is problematic to NTFS.

BitWackr C Series’  compressed block manager eliminates disk out of space conditions through the use of several techniques including a “reserve area” that is visible only to the BitWackr C Series’ software until required to prevent an out of space condition.

Additional out-of-space safeguards include the following:

  • To prevent an out-of-space condition from occurring, the BitWackr C Series issues multiple levels of alerts as the amount of physical capacity remaining is depleted. Email alerts are issued at user-defined levels, typically at 85% (information level), 90% (warning level) and 95% (critical level) of the physical capacity being used.
  • As a LUN reaches the over-committed condition and cannot map more blocks, the BitWackr C Series takes the over-committed volume offline and fails existing and future READ and/or WRITE commands.
  • A compressed disk or RAID volume with 1GB or less of physical capacity remaining will be prevented from mounting (there is a mechanism in place to “force” the mount of a volume with 1GB or less of physical capacity remaining, but that is beyond the scope of this document).
  • Following a Microsoft-recommended thin provisioning protocol described in a recent knowledge base posting[1], the BitWackr C Series returns special status condition codes that cause Windows to trigger a bus re-scan that ultimately results in the PnP removal of the LUN.

These actions taken by the BitWackr C Series prevent any I/O activity from taking place that could leave the file system in an inconsistent state.

Space Reclamation

The next challenge to compressing disk and RAID volumes is reclaiming unused space. 

The BitWackr C Series includes a utility, similar to a traditional defrag utility, that performs compressed volume space reclamation.

Utilities

Compressed disk and RAID requires a comprehensive set of utilities to perform space reclamation, to replace the chkdsk, defrag and other programs needed to perform a reset following any type of failure and to keep the device operating optimally. This utility software, used to ensure optimal ongoing performance, is included as part of the BitWackr C Series product.

The BitWackr C Series also makes deduplicating backup software even better. Check out Exar’s BitWackr C Series here.


[1] http://support.microsoft.com/kb/959613

Advertisements

Written by BitWackr

June 30, 2010 at 8:54 am

One Response

Subscribe to comments with RSS.

  1. […] Hardware Compression for Disk and RAID Storage June 2010 4 […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: