Data masking or data obfuscation is the process of de-identifying (masking) specific data elements within data stores.
The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles. It must also look real and appear consistent. It is more common to have masking applied to data that is represented outside of a corporate production system. In other words where data is needed for the purpose of application development, building program extensions and conducting various test cycles. It is common practice in enterprise computing to take data from the production systems to fill the data component, required for these non-production environments. However the practice is not always restricted to non-production environments. In some organisations, data that appears on terminal screens to call centre operators may have masking dynamically applied based on user security permissions. (eg: Preventing call centre operators from viewing Credit Card Numbers in billing systems)
The primary concern from a corporate governance perspective is that personnel conducting work in these non-production environments are not always security cleared to operate with the information contained in the production data. This practice represents a security hole where data can be copied by unathorised personnel and security measures associated with standard production level controls can be easily bypassed. This represents an access point for a data security breach.
A key requirement for any data masking and obfuscation practice is that the data must remain meaningful at several levels.
Firstly, it must remain meaningful for the application logic. For example, if elements of addresses are to be obfuscated and city and suburbs are replaced with substitute cities or suburbs, then, if within the application there is a feature that validates postcode or post code lookup, that function must still be allowed to operate without error and operate as expected. The same is also true for Credit Card algorithm validation checks and Social Security Number validations.
Secondly, the data must be sufficiently treated so that it is not obvious that the masked data is from a source of production data. For example, it may be common knowledge in an organisation that there are 10 senior managers all earning in excess of $300K. If in a test environment of the organisations HR System there are also 10 identities in the same earning bracket, then other information could be pieced together to reverse engineer a real life identity. Theoretically, if the data is obviously masked or obfuscated, then it would be reasonable for someone with data breach intentions to assume that they could reverse engineer identity data if they had some degree of knowledge of the identities in the production data set. It is for this reason that data obfucation or masking of a data set is conducted in such a manner as to ensure that identity and sensitive data records are protected and not just the individual data elements in discrete fields and tables.