Secure Data Sharing in Machine Learning: Exploring K-Anonymization and Differential Privacy

Main Article Content

Malath Sabri Kareem

Abstract

The problem of data privacy has now become one of the most pressing ones, particularly with the growth of the number of people who use their identity data to sign up for various services. This work examines the combined application of k-anonymity and differential privacy approaches to achieve privacy preservation on health related information while maintaining its usefulness for analysis. The concept of K-anonymity tries to ensure that nobody can be uniquely identified from his records by making each record indistinguishable from at least k-1 other records; differential privacy also offers a measure to unable a person’s contribution to the overall data measurement outcome. These techniques are used in this study to health-related datasets, and assess the performance of a combined approach to achieving privacy and data utility. A significant piece of empirical research is provided to show that a combination of k-anonymity and differential privacy is not only realistic in a practical application but also provides optimum privacy protection while inflicting minimum loss in data quality. The reduction in reidentification risk is by a whopping 80% when data is processed through the combined method, but the utility of the data gathered only equals 85% the extent of the original data gathered from raw contextual information. Doing so proves the applicability of these privacy-preserving methods for the protection of health data represented by the results given in the paper.

Article Details

Section
Articles