My view on Tech: De-duplication: what is it?

This is the data storage technology I see lot of promises in future. Of course there is lot of confusion about its promise and so beaware that it's potential depends upon the type of applications.

De-duplication is a disruptive technology that reverses “duplication” of data in the traditional backup environments. The technology reduces data to be backed up by the order of magnitude amount. The traditional backup solution burdened with growing backup window and amount of data archived. Compression techniques and retention policies have been implemented to address the data growth. However these methods have minimal reduction of data growth burden.

De-duplication process work on reversing this growing data. The technology divides the data into segments and only the segments that have been modified will be backed up to the secondary storage. The redundant segments are determined by a commonality factor and will not be part of the backup. A typical de-dupe application is expected to reduce around 200-500x data reduction in backup environment. This technology is disruptive to the current backup environment and will be playing major role in coming 3-5 years.

The de-duplication saves WAN bandwidth and growth of secondary storage by reducing overall data in backup process. In affect it will reduce network bandwidth costs, secondary storage costs, support costs and installation costs. De-duplication can be performed at source or target of backup data. The source de-dup reduces the network bandwidth as well as secondary storage, where as target de-dupe will reduce secondary storage. EMC’s Avamar and Data Domain Appliance series are examples of source-based and target-based de-duplication.

Major storage players are focusing on the de-duplication. Many backup software vendors are started working on de-dupe solutions as part of their offerings. The archiving vendors such as VTL vendors are integrating de-dupe as part of the archiving solutions.

De-duplication is technology that can go beyond the backup and archiving environments. Data reduction is desired functionality in areas such as replication. De-duplication in the replication market is untapped opportunity. It can reduce data to be sent over to remote site in the remote replication and increases the performance of asynchronous mirroring application. Due to early stage of the technology, Replication vendors haven’t fully embraced De-duplication.

Many De-duplication solutions focused on data reduction by focusing data changes over time which can be referred as Temporal de-duplication. NetApp introduced Single-Instance-Storage de-duplication solution that takes advantage commonality of data within storage which can be referred as Spatial de-duplication. The spatial de-duplication is relatively very new concept and potential area for vendors to tap new opportunities. The spatial de-duplication could reduce primary as well as secondary storage needs for certain applications.

De-duplication is game changing technology in coming 3-5 years. Even though it is getting traction in backup environments, the technology has potential in many areas where data reduction is desired. Both temporal and spatial de-duplication has advantages applicable in certain application environments.

My view on Tech

Introduction

Monday, April 7, 2008

De-duplication: what is it?

No comments:

About Me

Blog Archive