Data masking overview¶
Data masking protects sensitive information by blocking unauthorized users from accessing the real data. This process creates altered versions of data for specific uses, like presentations, sales demonstrations, or software testing. The masked data keeps the same format as the original but contains changed values that cannot be reversed to reveal the true information. By making the data worthless to outsiders, masking helps organizations reduce their risk of data breaches or misuse. Companies can safely use masked data in various scenarios without exposing confidential details to unauthorized parties.
Data masking in Percona Server for MySQL is an essential tool for protecting sensitive information in various scenarios:
Scenario | Description |
---|---|
Protecting data in development and testing | Developers and testers require realistic data to validate applications. By masking sensitive details, such as credit card numbers, Social Security numbers, and addresses, accurate user information can be safeguarded in non-production environments. |
Compliance with data privacy regulations | Stringent laws like GDPR, HIPAA, and CCPA mandate the protection of personal data. Data masking enables the anonymization of personal information, facilitating its use for analysis and reporting while ensuring compliance with regulations. |
Securing data when collaborating with external entities | Sharing data with third-party vendors demands the masking of sensitive information to prevent access to accurate personal details. |
Supporting customer service and training | Customer support teams and trainers often require access to customer data. Through data masking, they can utilize realistic information without compromising actual customer details. |
Facilitating data analysis and reporting | Analysts rely on access to data for generating reports and uncovering insights. By employing data masking techniques, they can work with realistic data sets without compromising privacy. |
These examples underscore how data masking serves as a crucial safeguard for sensitive information, allowing organizations to leverage their data effectively across diverse functions.
Data masking helps to limit the exposure of sensitive data by preventing access to non-authorized users. Masking provides a way to create a version of the data in situations, such as a presentation, sales demo, or software testing, when the real data should not be used. Data masking changes the data values while using the same format and cannot be reverse engineered. Masking reduces an organization’s risk by making the data useless to an outside party.
Version updates¶
Percona Server for MySQL 8.0.41 introduces an internal term cache for the following functions in the data masking component:
Instead of querying the underlying mysql.masking_dictionaries
table each time a function is executed, the server now utilizes internal in-memory data structures for lookups. This enhancement significantly improves performance, particularly when processing multiple rows.
With this redesign, the internal dictionary term cache might get out of sync with the underlying dictionaries table (default is mysql.masking_dictionaries
). This can happen if you directly change the table instead of using the dedicated dictionary manipulation functions ([masking_dictionary_term_add()]((data-masking-function-list.md#masking_dictionary_term_adddictionary_name-term_name
), masking_dictionary_term_remove()
, masking_dictionary_remove()
.
To resync the internal dictionary term cache, we added a new function called masking_dictionaries_flush()
. This function takes no arguments and returns 1 when it succeeds.
This redesign also affects row-based replication. Changes to the dictionaries table, either through dedicated functions or directly on the source, are sent to a replica via the binary log. The applier thread reads these binary log events on the replica and applies them successfully. However, the dictionary term cache on the replica doesn’t update automatically.
We introduced a new system variable called component_masking_functions.dictionaries_flush_interval_seconds (read-only, integer, unsigned, default 0)
.
When you set this variable to any value other than 0, the component starts a background thread at startup that periodically syncs the dictionaries table with the internal dictionary term cache. The value specifies the number of seconds between each sync.
If this variable has a non-zero value on a replica, the dictionary term cache eventually syncs with the underlying dictionaries table after receiving those binary log events.
Data masking techniques¶
The common data masking techniques are the following:
Technique | Description |
---|---|
Custom string | Replaces sensitive data with a specific string, such as a phone number with XXX-XXX-XXXX |
Data substitution | Replaces sensitive data with realistic alternative values, such as city name with another name from a dictionary |
Additional resources¶
Component:
Install the data masking component
Data masking component functions
Plugin:
Get expert help¶
If you need assistance, visit the community forum for comprehensive and free database knowledge, or contact our Percona Database Experts for professional support and services.