Deduplication is a process to improve data quality by removing redundant or repetitive information from data in storage to improve storage utilization, simplify ETL, and optimize data transfers.
Data deduplication is a particular problem in organizations that use high volumes of hosted business applications. For example, a hosted customer relationship management (CRM) application will create database instances about customers that already exist in on-premise applications. Data-savvy organizations will undertake to rationalize both instances so that there is only one comprehensive source of information about customers within the business.
Organizations often do not have visibility into the sources or causes of redundant data. Thus they have no way of knowing how much redundant data is costing them. For example, a retailer can waste a lot of money sending multiple copies of the same catalog or campaign to one prospective customer. By deduplicating the data ahead of time the company can prevent waste.