Updated 2021 by the data privacy team
A common misconception among data governance and data privacy stewards when considering data anonymization solutions is that encryption for data security is a form of data masking. Data masking and data encryption are two technically distinct data privacy solutions. Data encryption, at the structured data field level, is a data masking function. However, both can be useful to address regulatory compliance, such as the GDPR and CCPA and other data privacy use cases, such as protecting big data analytics to reduce data exposure risks.
There are many similarities between data masking and data encryption for data privacy solutions, although the differences are substantial. Each of them is designed to help ensure data protection, which can be improved when both are used in synergy.
Fundamental differences between data masking and data encryption
Fundamental difference: Encryption is typically applied to data at rest or data links (data in motion) where usability is not needed in a real-time application, such as long-term data storage or data transfers. Traditional key management enables encrypted data to be transformed back into cleartext (readable) for use in applications when needed, but it’s not a nuanced data privacy solution—it operates on files or data volumes, such as what you’d find in data archives.
Data masking is often considered data-centric security, as it persists with the data when moved and used by hiding data elements that users of certain applications should not see. Persistent data masking replaces sensitive data with similar-looking proxy data, which are typically randomly generated characters that will meet the requirements of a system designed to test or process the masked results.
Data masking ensures vital parts of personally identifiable information (PII) and other confidential data, like the first five digits of a social security number, are obscured or otherwise de-identified. Similarly, “dynamic” data masking can transform data on the fly based on the user role (privileges) at the time of request. It is used to secure real-time transactional systems and speeds up data privacy management, regulatory compliance implementation, and ongoing data governance policies.
Data masking may not necessarily need to encrypt all the information in a record. All data records are seen in their native format, and no decryption key is necessary, while leaving fewer sensitive data attributes in the clear (e.g., bank routing number, but not the account number). In essence, what is allowed to be seen today – and not a byte more. And tomorrow even fewer may be seen, if the rules change overnight due to evolving data privacy laws, since data masking can operate at a fine-grained level to limit inappropriate uses. Ideally, the resulting data set does not contain any references to the original information. That makes it useless for attackers attempting a data security breach or insiders who could inadvertently expose data improperly.
Data encryption involves converting and transforming data into scrambled, often unreadable, cipher-text using non-readable mathematical calculations and algorithms, such as AES256, RSA and DES. Restoring the message requires a corresponding decryption algorithm and the original encryption key. Data encryption is the process of transforming information by using some algorithm (a cipher) to make it unreadable to anyone except those possessing a key. It is widely used to protect files and volumes on a local, network or cloud data repository, network communications such as SSL, or simply just web/email traffic protection.
When to use data masking vs data encryption
Data masking is often used by those who need to test with sensitive data or perform research and development on sensitive projects, and therefore would prefer to operate on a desensitized proxy value of the original data to minimize risk exposure. Application development teams commonly require production data for testing purposes, but this increases risk exposure. Because this sensitive data is passed through many hands, it is at significant risk of theft in a data breach or misuse with regards to privacy policies. Through the process of redacting (stripping, covering over, or removing), the important data attributes of the data set, such as names, addresses, patient information, and other privacy-regulated data, data protection with usability can be balanced. This process, however, can often be irreversible, depending on the data masking technique being applied.
Common terms such as data anonymization and de-identification refer to such processes that irreversibly sever the identifying information in the data set, such as destroying the data encryption key. Similarly, pseudonymization implies the de-identification could be reversed, ideally when the data use is authorized. Data anonymization prevents future identification of the original data, even by the people conducting the research or testing. For example, one cannot discern or re-identify a social security number that is exposed with its first five digits covered by randomized characters.
Data encryption is often used to protect data transferred between computers or networks, or stored at rest long-term, so that it can be later restored. Data such as this, whether in transit or at rest, could be vulnerable to a data breach provided one has access to the data encryption keys used. Conversion of data into non-readable cipher text creates highly secure results when strong data encryption algorithms are used and keys are protected in hardened devices, such as key management appliances. The only way to gain access to the data is to unlock it with a key, which only those authorized can access. However, as mentioned, data encryption sacrifices data usability, because of its coarse-grained nature of protecting entire files or data volumes, requiring decryption prior to data use.
Security perspective: From a data privacy and data security point of view, the best persistent data masking solution uses random generation of characters to mask the data, as it can be independent of key management requiring the key to be retained. Moreover, traditional data encryption is not ideally set up for reliable data masking because it is not fine-grained enough for records to maintain usability and will be based on a data encryption key that creates risk exposure when keys are in the wrong hands. Reversibility is not needed for most data privacy requirements. The concept of “one input, one output” (a 1‐1 map) can be abandoned, along with the concept of determinism. Abandoning these two core principles of data encryption allows for secure data masking solutions to support data anonymization requirements.
In short, if the goal is to protect non-production data from unauthorized entry and the data is important for long-term retention, then use data encryption.
However, if you need to use production data in a test environment or real-time applications, where the content of the data can be precisely redacted to reduce its sensitivity to data privacy risk exposure, then use data masking. Not only can data masking be more secure than data encryption when using persistent data masking approaches, users may also find it to be a more efficient process.
It may be easy to think of data masking and data encryption as the same since they are both potentially data-centric means of protecting specific sensitive data classes. However, their inherent procedures and use case purposes differentiate them. And an investment into a secure environment will preserve company reputation and customer loyalty for years to come.
Get more information on data masking and data encryption
Informatica helps organizations by offering two types of data masking. One option, persistent data masking, is typically used on test data to simulate production data with lower risk – or to mask attributes in records that do not need reversing.
The second option is dynamic data masking. As the name implies, it’s a temporary data masking technique that addresses context of use, depending on the application or user requirements. Each can be a useful tool for data anonymization or data pseudonymization to meet data privacy policies for GDPR, CCPA or other privacy regulation that mandate data protection.
Here are a couple resources to learn more about the techniques covered above:
Also, be sure to check out the Informatica Data Privacy Management solution.