When trying to protect your data from the nefarious souls that would like access to it (?), there are several options available that apply to very specific use cases. In order for us to talk about the different solutions - it is important to define all of the terms:
Each of the three methods for protecting data (encryption, tokenization and data masking) have different benefits and work to solve different security issues . We'll address them in a bit. For a visual representation of the three methods – please see the table below:
Original Value | Encrypted | Tokenized | Masked | |
Last Name | johnson | 8UY%45Sj | wjehneo | simpson |
First Name | margaret | 3%ERT22##$ | owhksoes | marge |
SSN | 585-88-9874 | Mh9&o03ms)) | 93nmvhf93na | 345-79-4444 |
Encryption
For protecting PHI data - encryption is superior to tokenization. You encrypt different portions of personal healthcare data under different encryption keys. Only those with the requisite keys can see the data. This form of encryption requires advanced application support to manage the different data sets to be viewed or updated by different audiences. The key management service must be very scalable to handle even a modest community of users. Record management is particularly complicated. Encryption works better than tokenization for PHI - but it does not scale well. Properly deployed, encryption is a perfectly suitable tool for protecting PII. It can be set up to protect archived data or data residing on file systems without modification to business processes.
Tokenization
For tokenization of PHI - there are many pieces of data which must be bundled up in different ways for many different audiences. Using the tokenized data requires it to be de-tokenized (which usually includes a decryption process). This introduces an overhead to the process. A person's medical history is a combination of medical attributes, doctor visits, outsourced visits. It is an entangled set of personal, financial, and medical data. Different groups need access to different subsets. Each audience needs a different slice of the data - but must not see the rest of it. You need to issue a different token for each and every audience. You will need a very sophisticated token management and tracking system to divide up the data, issuing and tracking different tokens for each audience.
Data Masking
Masking can scramble individual data columns in different ways so that the masked data looks like the original (retaining its format and data type) but it is no longer sensitive data. Masking is effective for maintaining aggregate values across an entire database, enabling preservation of sum and average values within a data set, while changing all the individual data elements. Masking plus encryption provide a powerful combination for distribution and sharing of medical information
Traditionally, data masking has been viewed as a technique for solving a test data problem. The December 2014 Gartner Magic Quadrant Report on Data Masking Technology extends the scope of data masking to more broadly include data de-identification in production, non-production, and analytic use cases. The challenge is to do this while retaining business value in the information for consumption and use.
Masked data should be realistic and quasi-real. It should satisfy the same business rules as real data. It is very common to use masked data in test and development environments as the data looks like "real" data, but doesn't contain any sensitive information.