BitWackr Blog

Deduplication beyond backup

2010 in review

leave a comment »

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is doing awesome!.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 1,600 times in 2010. That’s about 4 full 747s.

 

In 2010, there were 6 new posts, not bad for the first year! There were 3 pictures uploaded, taking up a total of 52kb.

The busiest day of the year was October 28th with 53 views. The most popular post that day was One Gigabyte per Second In-Line Disk Data Compression Using an “Off-the-Shelf” Server.

Where did they come from?

The top referring sites in 2010 were exar.com, bridgestor.com, obama-scandal-exposed.co.cc, dev.exar.net, and healthfitnesstherapy.com.

Some visitors came searching, mostly for bitwackr, sharepoint deduplication, lsi sss6200, lsi sss6200 price, and deduplication sharepoint.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

One Gigabyte per Second In-Line Disk Data Compression Using an “Off-the-Shelf” Server August 2010

2

Deduplicating Microsoft SharePoint Data April 2010

3

Hardware Compression for Disk and RAID Storage June 2010

4

Backup Deduplication vs. Application Storage Deduplication April 2010

5

The State of Primary Storage Data Deduplication May 2010

Advertisements

Written by BitWackr

January 2, 2011 at 3:41 pm

Posted in Uncategorized

Finally, the Arrival of Advanced Data Reduction (ADR) Appliances

with one comment

BridgeSTOR, LLC headquartered in Poway, CA and led by “serial entrepreneur” John Matze has just launched a family of ADR appliances featuring primary and secondary data deduplication, compression, thin provisioning and – by the way – optional (user selection – included feature) encryption for data at rest. We certainly support the conclusion that these data reduction features, when delivered with little-to-no performance impact, produce highly compelling business cases for end-user buyers.

The BridgeSTOR VMware primary storage appliance (referred to as an “Application Optimized Storage” appliance, or “AOS” appliance) for example, when deployed with ESXi servers or the VMware vSphere Essentials Kit, is said to offer cost-effective server and storage virtualization for consolidation and management of branch offices, small and medium businesses and distributed enterprises.

BridgeSTOR appliances combine in-line, ASIC-assisted block-level data compression, deduplication and thin provisioning for VMware templates, images and application data.  Typical data reduction ratios for a combination of Linux and Windows Virtual Machine OS images is claimed to range from 75% to 90%.

The BridgeSTOR storage appliance for VMware delivers capacity-optimized iSCSI and NAS storage optimized for use in a VMware vSphere 4.1 environment with multiple live virtual machines running capacity-intensive applications with mid-range IOPS requirements. Examples of applications for which these appliances are ideally suited include Cloud Storage, data mining, data warehousing, general business, R&D, geo-seismic, archive, VMware infrastructure and shared home directory applications.

Deduplication and compression reduce the storage capacity needs of VMware applications and significantly reduce the capacity requirements of VMware infrastructures including template storage and “standby” VMs.  They also further optimize VMware Data Recovery (vDR) data, further reducing the size of disaster recovery information used by VMware.

The website goes on to say that BridgeSTOR appliances for VMware provide an integrated, shared storage capability that supports popular VMware-specific storage features including:

  • vMotion
  • Storage vMotion
  • DRS and HA clustering
  • Volume Grow (by enabling dynamic LUN resizing and increasing the size of the VMware DataSTOR (VMFS) “live”)
  • Virtual and physical RAW LUN support

10 GbE Network I/O Virtualization

The other compelling bit of technology in the BridgeSTOR VMware appliance is called Virtual Input/Output – Network Attachment (VIO-NA) that provides the equivalent of a direct network connection to iSCSI SAN storage for virtual machines running in an ESX or ESXi environment. VIO-NA improves performance of virtual machines and reduces the latency they experience when accessing SAN storage because the VIO-NA feature bypasses the VMware hypervisor by using hardware intelligence on the NIC.

Among the key benefits of ADR appliances when used in conjunction with VMware ESX and ESXi servers are said to be:

  • Support for an extensive array of VMware features and capabilities
  • VMware certification
  • Capacity optimization that reduces  physical disk capacity requirements through a combination of in-line data  deduplication and data compression
  • Storage capacity optimization  in-line and with negligible performance penalty
  • Critical data is RAID protected
  • Assured data integrity through hardware-generated CRCs that are stored with data
  • Cuts capital expenses by reducing  the number of RAID arrays, filers and disk drives required
  • Slashes operating expenses by  reducing power and cooling requirements
  • At rest data security is ensured  through strong encryption technology

BridgeSTOR Appliances for Backup

In addition to capacity optimization for primary and VMware storage, BridgeSTOR has added hardware data compression to the Backup Exec 2010 Deduplication Suite and packaged these features in a 2U HP-built appliance. The overall data reduction produced by Backup Exec 2010 is the product of the deduplication ratio multiplied by the compression ratio (send me an email if you’d like to see the math  bob.farkaly@exar.com). So when Backup Exec 2010 says it is producing a data deduplication ratio of, say 10:1, the effective data reduction ratio becomes 20:1 when 2:1 hardware compression is added. A dramatic improvement.

Anyone “kicking the tires” on one of the “well-known data deduplication products” and a BridgeSTOR backup appliance will be pleasantly surprised to see how much further his or her budget can be stretched by walking past the “well-known data deduplication products”.

Written by BitWackr

November 16, 2010 at 11:28 am

Posted in Uncategorized

One Gigabyte per Second In-Line Disk Data Compression Using an “Off-the-Shelf” Server

leave a comment »

Exar’s BitWackr C Series adds disk data compression to “off-the-shelf” Windows and Linux servers. BitWackr C Series is a combination of software and specialized hardware (a data reduction ASIC installed on a PCIe card) that simply, effectively and affordably extends hardware-accelerated data compression to disk storage volumes. Suitable for a broad range of Windows and Linux applications including biomedical research, scientific data analysis, HPC visualization, HPC simulation, geo-seismic investigation, data warehousing, data analytics, backup to disk, NAS, email, archive and other storage-capacity-intensive applications, BitWackr C Series typically cuts both storage requirements and the energy required to power and cool disk storage by up to 80% for very little money.

Unlike data compression performed in software using general-purpose processors, BitWackr C Series reduces data and storage requirements without adding to CPU workload or system overhead. This frees CPU cycles to perform higher value computational tasks and does not elongate processing times. Offloading in-line data compression to a PCIe card translates into an imperceptible change in application performance or responsiveness.

A prospective customer expressed interest in equipping off-the-shelf servers with disk data compression that was capable of delivering 880 Megabytes per second of in-line data compression to a disk storage subsystem. We accepted this challenge, but we also thought it might be interesting to demonstrate in-line data compression using commodity server hardware at an even higher rate, so we set a target of writing data at one Gigabyte per second (1 GB/second or 1024 MB/second) and set out to demonstrate that performance level.

The test system was configured as follows:

 

At the start of the test, we installed a single Exar Express DR1625 data reduction card in the Dell server as the compression hardware for the BitWackr C Series software. We next configured SATA SSDs as the “back end” data store and used LVM to create volumes to which the BitWackr C Series would write compressed data. The Linux “spew” command sent random data to each BitWackr-compressed volume.

The first set of tests resulted in aggregate throughput in the range of 400 – 500 MB/second. BitWackr was performing as expected, but we observed that the performance constraint was the bandwidth of the storage sub-system.

Identifying a back-end data store for our tests that was capable of ingesting data at Gigabyte per second rates was problematic. In the past, Gigabyte per second rates required multiple Fibre Channel interface connections for data transfer to the storage sub-system. SATA SSDs weren’t giving us the hoped for performance, so we decided to overcome this obstacle by trying PCIe Solid State storage that claims to be capable of sustaining Gigabyte per second data rates.  

In the next test, we replaced the SATA SSDs with a PCIe solid state device as the storage sub-system. We saw write throughput improve to between 500 and 550 MB/second. Thinking that we would probably not achieve our objective using this product, we tried something new.

The next test replaced the first PCIe solid-state device with a single LSI Solid State Storage Card (LSI SSS6200) http://www.lsi.com/storage_home/products_home/solid_state_storage/sss6200/index.html  to act as the storage sub-system. We immediately saw write throughput improve to about 800 MB/second. This was close to our customers’ requirement of 880 MB/second, but since our objective was to achieve an aggregate data rate of one Gigabyte per second, we pressed on.

We next installed a second LSI SSS6200 in the Dell server and used LVM to create volumes across both LSI cards. We saw a further performance increase to 1008 MB/second. But with our goal of one Gigabyte (1024 MB) per second within reach, we decided to run one more set of tests.

For the final tests, we configured two different volume groups on each LSI SSS6200. One volume group was configured for BitWackr metadata I/O and the other as the “back end” data store.

Using this configuration, we were able to measure an aggregate performance bandwidth of 1028 MB/second, thus exceeding our objective of one Gigabyte per second. The Linux “top” command confirmed that the Dell R710’s CPUs were coasting along because the “heavy lifting” of Gigabyte per second data compression was being effectively offloaded to the Exar DR1625 card. 

While demonstrating Gigabyte per second in-line disk data compression using commodity server hardware may sound a bit like a science fair project, there was sound business and technical justification for performing this test. We first wanted to prove to prospective server and appliance builders who are solving data-intensive challenges that in-line disk data compression at Gigabyte per second rates was achievable without the need for specialized server hardware.

Finally, we wanted to make sure that Exar customers understand that highly effective, high performance disk data compression for biomedical research, scientific data analysis, HPC visualization, HPC simulation, geo-seismic investigation, data warehousing, data analytics, backup to disk, NAS, email, archive and other storage-capacity-intensive applications can now be brought to market quickly, cost-effectively and with minimal development time or effort.

I believe we accomplished these objectives.  

Although our objective was to test and demonstrate BitWackr C Series performance, we also learned a great deal about the I/O characteristics of PCIe storage from two leading vendors. We needed a high performance storage back-end to support BitWackr’s operational capabilities and this class of storage device delivered the performance we needed.

Exar’s BitWackr C Series data compression is available to OEMs, SIs and VARs for immediate evaluation. The product has a list price of well under $1,000 for both the hardware and software in single unit quantities.

Written by BitWackr

August 13, 2010 at 1:06 pm

Posted in Uncategorized

Hardware Compression Multiplies Deduplication Ratios

leave a comment »

I had dinner with Mr. Backup (Curtis Preston) the week before last at Legal Seafood in Boston in conjunction with the BD Event and took the opportunity to ask Curtis about several of his posts that emphasized the importance of deduplication ratios.

To summarize the various posts and cross posts, the issues on the table are 1) the quantity of disk to manage (monitor, replace on failure, etc.), power, and cool and 2) the effect on replication and bandwidth.

Both points are well taken, but there is one additional point to consider: The most relevant metric to use in these discussions isn’t solely the deduplication ratio – it’s the “data reduction ratio”. The data reduction ratio embraces the deduplication ratio but – most importantly – it recognizes the data compression ratio as a further component of data reduction.

Let me explain using block-level deduplication as the basis of this discussion. If a 100MB file (consisting of 16,384 unique 64KB blocks) is stored 5 times in 5 different directories (or with 5 different file names), a block deduplication system will reduce 16,384 x 5 blocks (81,920 blocks) to just 16,384 blocks (the blocks are assumed to be unique within the file so there is no deduplication among the 16,384 blocks themselves). Hence, 500MB of file data deduplicates down to 100MB of data blocks with a resulting data deduplication ratio of 5:1.

But what happens if those same 16,384 blocks are compressed at a rate of 2:1 at the same time they are being deduplicated but before they are stored on disk? Well, the amount of data actually stored becomes ½ x 100MB or just 50MB. And 50MB stored that represents 500MB of non-deduplicated and uncompressed data computes to a data reduction ratio of 10:1. So the total data reduction ratio becomes the deduplication ratio multiplied by the compression ratio making compression a data reduction effectiveness multiplier.

So if the data reduction ratio is, indeed, the operative metric, then compressing blocks as they are deduplicated should be an essential component of a comprehensive data reduction system.

Exar’s newly-announced BitWackr C Series adds hardware compression (think compression at the tape drive level) to deduplicating backup to disk software from Symantec, CA Technologies, BakBone or CommVault. These are great backup products made even greater through the addition of hardware compression.

A description of the BitWackr C Series is available here.

Written by BitWackr

June 29, 2010 at 12:01 pm

Posted in Uncategorized

The State of Primary Storage Data Deduplication

with one comment

The value of deduplication is immediately obvious in backup. Because of tradition (and operational laziness) people back up the same data day after day, week after week, month after month (etc., you get the idea). Multiple copies of the same data are obviously prime targets for deduplication. The value of data deduplication for primary storage is still not being talked about. It’s like the crazy aunt that’s kept locked up in the attic. But deduplication is great for other applications, like NAS, for the same reason. People make and keep multiple copies of the same data because of tradition and operational laziness. We just don’t talk about duplicate data in NAS like we talk about duplicate data in backup.

The evidence is so obvious that big companies, like NetApp, have grasped it. That’s why NetApp offers their deduplication with a guarantee (guarantee!) that it will save 1/2 of used storage. That’s a safe bet. The real savings are closer to 80%. Don’t other storage companies get it? Of course they do. But why kill the goose that’s busy laying golden eggs? Storage companies LOVE duplicate data because they sell the capacity to store it.

The revolution will start some day. It will need one big, Tier 1 to tell the dirty secret and let the crazy aunt out of the attic. Then the floodgates will open. Until then, it’s an uphill battle.

Until then, our small community (Greenbytes, Storewize, Ocarina and Exar) we keep evangelizing primary storage deduplication in our own small way.

But wait! Is there light at the end of the tunnel? Didn’t I just get an email solicitation from EMC about “Nearline deduplication” coupled with DataDomain? Maybe the start of the primary storage deduplication revolution is upon us!

Let’s see what happens.

Written by BitWackr

May 27, 2010 at 8:16 am

Posted in Uncategorized

Deduplicating Microsoft SharePoint Data

with one comment

Microsoft SharePoint is a collaboration tool that helps improve business effectiveness through a combination of content management, information sharing and enterprise search. It provides IT professionals a platform and the tools needed to enable server administration and interoperability.

Microsoft SQL Server, a block storage-based database, is the engine that powers SharePoint.    

Exar’s Hifn Technology BitWackr www.bitwackr.com reduces the capacity required to store Microsoft Windows Server 2003 and 2008 data – including SharePoint data – through a combination of data deduplication and data compression.

BitWackr data reduction is not designed to deduplicate backup data. Deduplicating backup data is application-specific and dependent upon the data formats produced by backup software products.

The BitWackr reduces the amount of unstructured (application) data retained by an enterprise in volumes that contain primary (first copy) data for which performance is not the highest priority – the type of data held in low-to-medium-activity SharePoint storage volumes, for example.

Data Compression

Data compression is a technique that re-encodes data so that it takes up less storage space. Compression is performed by finding repeatable patterns of binary 0s and 1s meaning that the more patterns that can be found, the more the data can be compressed.

There are “lossy” and lossless” forms of data compression. Lossy compression works on the assumption that data doesn’t have to be stored perfectly. Much information can be simply thrown away from images, video data, and audio data, and when uncompressed such data will still be of acceptable quality. Lossless data compression is used when the data has to be uncompressed exactly as it was before compression. If you compress a block and then decompress it, the block is not changed. The BitWackr employs lossless compression techniques to ensure that data integrity is maintained as data is compressed and decompressed.

The metric employed in data compression is the “compression ratio”, or ratio of the size of a compressed block to the original uncompressed block. For example, suppose a block of data before compression occupies 64 kilobytes (KB) of space. Using data compression, that block may be reduced in size to, say, 32 KB, reducing by ½ the amount of capacity required to store the data. In this case, data compression reduces the size of the data file by a factor of two, resulting in a “compression ratio” of 2:1 or a “data reduction percentage” of 50%.

Some data can be highly compressed while other data will compress very slightly or even not at all. The amount of compression experienced depends on the type of data and the compression algorithm employed.

Data Deduplication

Data deduplication is a technique that eliminates redundant blocks of data. In a typical deduplication operation, blocks of data are “fingerprinted” using a hashing algorithm that produces a unique identifier for data blocks. These unique fingerprints along with the blocks of data that produced them are indexed, compressed and retained. Duplicate copies of data that have previously been fingerprinted are deduplicated, leaving only a single instance of each unique data block along with its corresponding fingerprint. 

The fingerprints along with their corresponding full data representations in a compressed form are retained to enable reconstituting the deduplicated block when the data is retrieved.

Some data can be aggressively deduplicated while other data will show little to no effect from deduplication. The level of deduplication experienced depends on the type of data being acted upon and the behavior of those storing the data.

The Sequence in which Data Reduction is Performed

The BitWackr combines compression and deduplication to reduce the capacity required to store data. Encryption is an administrator-selected option that can be invoked at the time a BitWackr volume is created.

In deduplication systems other than BitWackr, an incoming block is first hashed to extract its fingerprint. Next, in a second step, the block is compressed. Finally, in a third distinct step, the compressed block is encrypted. Note that compression always precedes encryption because the role of encryption is to introduce randomness into the data while compression operates best on data with the least randomness.

Exar’s Hifn Technology DR1605 PCIe card – the hardware component of the BitWackr – performs SHA-1 hashing, eLZS compression and if selected, AES-256 CBC encryption simultaneously in a single operation. The SHA-1 hash is used to determine whether the block being processed is unique or is a duplicate. If the block is determined to be unique, the compressed (and optionally encrypted) block is stored in the BitWackr Data Store. If the block is a duplicate, appropriate counters are updated and the next block of data is processed.

By performing hashing and data transformation (compression and encryption) block operations simultaneously, the BitWackr reduces latency in the deduplication process. This is important because latency is the enemy of deduplication system performance. 

The Combined Effect of Deduplication and Compression

Data deduplication and compression work together to produce a combined data reduction effect. Depending upon the data, the BitWackr’s deduplication algorithms can yield data reduction on the order of 10 to 80 percent for unstructured application and SharePoint data.  Compression works on the deduplicated data load as well as on blocks that do not deduplicate, shrinking the amount of capacity required to store data by as much as an additional 66 percent, so the combined data reduction and storage capacity savings over time can range up to as much as 90 percent (caution – your mileage may vary).

In order to quantify the relative effects of deduplication and compression on overall data reduction, we use a BitWackr utility program to disaggregate total data reduction into its components. Using typical business data, our observations show that about 2/3 of total BitWackr data reduction stems from data deduplication while the remaining 1/3 of the data reduction can be attributed to compression. 

Here’s a short YouTube video describing the BitWackr  for Microsoft Windows Server 2008

http://www.youtube.com/watch?v=r5qfygdiYGg

And here’s another short YouTube video describing BitWackr advanced data reduction for SharePoint 

http://www.youtube.com/watch?v=i4WI9d49U9A

Written by BitWackr

April 12, 2010 at 1:53 pm

Posted in Uncategorized

Backup Deduplication vs. Application Storage Deduplication

with one comment

Since the DD200 announcement in June 2003, DataDomain (now an EMC company) has been at the forefront of backup deduplication technology and has made “deduplication” synonymous with “backup”.

Today, most products with “deduplication” in their name share a common characteristic: they are highly optimized for the backup application. And whatever logo your favorite deduplication product wears – DataDomain, IBM/Diligent, EMC, FalconStor, Atempo, Commvault, CA, Symantec or others – they count on the data stream being sent to the deduplication system being a backup data stream.

Data Deduplication

Block-based data deduplication is a technique that eliminates redundant blocks of data. In a typical deduplication operation, blocks of data are “fingerprinted” using a hashing algorithm that produces a unique, “shorthand” identifier for data blocks. These unique fingerprints along with the blocks of data that produced them are indexed and can be compressed and encrypted (if these functions are supported by the deduplication product) and retained. Duplicate copies of data that have previously been fingerprinted are deduplicated, leaving only a single instance of each unique data block along with its corresponding fingerprint.

The fingerprints along with their corresponding full data representations are stored (in an optionally compressed and encrypted form).

Saying this is easy. Doing it in the data path at speed with a “reasonable” amount of resources and with “reasonable” performance is the challenge.

Data deduplication, regardless of the application for which it is employed, has a number of common characteristics. First, data is ingested by the deduplication engine (in an appliance or in software running on a general purpose server). The ingested data is operated on in blocks (some vendors use fixed-length blocks while others use variable length blocks). Most vendors next run the blocks through a hashing engine that employs a cryptographic function (SHA-1, MD5, etc.) to produce the block’s hash fingerprint. Hash fingerprints are unique. A “hash collision” would occur if two different blocks of data generated the same hash.  Such an event could cause a hash fingerprint to be associated with an incorrect block of data resulting in an error when the data block was retrieved. Although there is a statistical possibility of a hash collision occurring, it is a risk much smaller than the risks storage administrators live with every day. So we feel safe in saying that the hash fingerprint uniquely identifies a specific block of data.

Once the block fingerprint has been calculated, the deduplication engine has to compare the fingerprint against all the other fingerprints that have previously been generated to see whether this block is unique (new) or has been processed previously (a duplicate). It is the speed at which these index search and update operations are performed that is at the heart of a deduplication system’s throughput.

The reason is that, in an in-line deduplication engine, the amount of time it takes to make the decision whether a block is new and unique or is a duplicate that has to be deduplicated translates into latency. And latency is the enemy of deduplication performance. Some simple math tells us why.

Suppose a disk drive takes 5ms to access an index entry. In a serial process, this means that the index drive can process 200 index operations every second. If the I/Os to the deduplication system’s data store are 32KB in length, the upper limit of throughput would be 32KB x 200 operations per second or 6.5 MB/second.

In practice, performance can actually be much less than that as multiple I/Os are typically required to add new fingerprints and update counters. One alternative would be to make the Hash Table memory resident. But in order to represent any reasonable amount of capacity, the hash index size will exceed the memory capacity of most servers, forcing the hash index to reside on disk. So now returning to the discussion on disk random access performance – and assuming amazingly clever algorithms for getting to exactly the right index entry in a single I/O, deduplication performance still leaves much to be desired.

There has to be a better way. Indeed, there are several better ways and as backup deduplication vendors search for new and innovative ways to circumvent the hash index processing bottlenecks, our research and development efforts have been focused on optimizing hash generation, lookup and table update for unstructured application data.

Written by BitWackr

April 7, 2010 at 9:48 pm

Posted in Uncategorized