When querying databases containing sensitive information,the privacy of individuals stored in the database has to be guaranteed.Such guarantees are provided by differentially private mechanisms which add controlled no...When querying databases containing sensitive information,the privacy of individuals stored in the database has to be guaranteed.Such guarantees are provided by differentially private mechanisms which add controlled noise to the query responses.However,most such mechanisms do not take into consideration the valid range of the query being posed.Thus,noisy responses that fall outside of this range may potentially be produced.To rectify this and therefore improve the utility of the mechanism,the commonly-used Laplace distribution can be truncated to the valid range of the query and then normalized.However,such a data-dependent operation of normalization leaks additional information about the true query response,thereby violating the differential privacy guarantee.Here,we propose a new method which preserves the differential privacy guarantee through a careful determination of an appropriate scaling parameter for the Laplace distribution.We adapt the privacy guarantee in the context of the Laplace distribution to account for data-dependent normalization factors and study this guarantee for different classes of range constraint configurations.We provide derivations of the optimal scaling parameter(i.e.,the minimal value that preserves differential privacy)for each class or provide an approximation thereof.As a result of this work,one can use the Laplace distribution to answer queries in a range-adherent and differentially private manner.To demonstrate the benefits of our proposed method of normalization,we present an experimental comparison against other range-adherent mechanisms.We show that our proposed approach is able to provide improved utility over the alternative mechanisms.展开更多
Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of priv...Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of privacy disclosure when sharing and mining. Therefore, ensuring the security of medical big data in the process of publishing, sharing, and mining has become the focus of current research. The objective of our study is to design a framework based on a differential privacy protection mechanism to ensure the secure sharing of medical data. We developed a privacy protection query language (PQL) that integrates multiple data mining methods and provides a secure sharing function.Methods: This study is mainly performed in Xuzhou Medical University, China and designs three sub-modules: a parsing module, mining module, and noising module. Each module encapsulates different computing methods, such as a composite parser and a noise theory. In the PQL framework, we apply the differential privacy theory to the results of the computing between modules to guarantee the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. In addition, PQL is encapsulated in MNSSp3 that is a data mining and security sharing platform and the data comes from public data sets, such as UCBI. The public data set (NCBI database) was used as the experimental data, and the data collection time was January 2020.Results: We designed and developed a query language that provides functions for medical data mining, sharing, and privacy preservation. We theoretically proved the performance of the PQL framework. The experimental results show that the PQL framework can ensure the security of each mining result and the availability of the output results is above 97%.Conclusion: Our framework enables medical data providers to securely share health data or treatment data and develops a usable query language, based on a differential privacy mechanism, that enables researchers to mine information securely using data mining algorithms.展开更多
基金supported by the Natural Sciences and Engineering Research Council of Canada(NSERC)under Grant Nos.RGPIN-2020-06482,RGPIN-2016-06253 and CGSD2-503941-2017.
文摘When querying databases containing sensitive information,the privacy of individuals stored in the database has to be guaranteed.Such guarantees are provided by differentially private mechanisms which add controlled noise to the query responses.However,most such mechanisms do not take into consideration the valid range of the query being posed.Thus,noisy responses that fall outside of this range may potentially be produced.To rectify this and therefore improve the utility of the mechanism,the commonly-used Laplace distribution can be truncated to the valid range of the query and then normalized.However,such a data-dependent operation of normalization leaks additional information about the true query response,thereby violating the differential privacy guarantee.Here,we propose a new method which preserves the differential privacy guarantee through a careful determination of an appropriate scaling parameter for the Laplace distribution.We adapt the privacy guarantee in the context of the Laplace distribution to account for data-dependent normalization factors and study this guarantee for different classes of range constraint configurations.We provide derivations of the optimal scaling parameter(i.e.,the minimal value that preserves differential privacy)for each class or provide an approximation thereof.As a result of this work,one can use the Laplace distribution to answer queries in a range-adherent and differentially private manner.To demonstrate the benefits of our proposed method of normalization,we present an experimental comparison against other range-adherent mechanisms.We show that our proposed approach is able to provide improved utility over the alternative mechanisms.
基金This work was supported by the Special Investigation on Science and Technology Basic Resources of the MOST of China(No.2019FY100103)the National Natural Science Founda-tion of China(No.62003291)+1 种基金the Xuzhou Science and Technology Project(No.KC20112)the Industry Univer-sity-Research-Cooperation Project in Jiangsu Province(No.BY2018124).
文摘Objective: Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of privacy disclosure when sharing and mining. Therefore, ensuring the security of medical big data in the process of publishing, sharing, and mining has become the focus of current research. The objective of our study is to design a framework based on a differential privacy protection mechanism to ensure the secure sharing of medical data. We developed a privacy protection query language (PQL) that integrates multiple data mining methods and provides a secure sharing function.Methods: This study is mainly performed in Xuzhou Medical University, China and designs three sub-modules: a parsing module, mining module, and noising module. Each module encapsulates different computing methods, such as a composite parser and a noise theory. In the PQL framework, we apply the differential privacy theory to the results of the computing between modules to guarantee the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. In addition, PQL is encapsulated in MNSSp3 that is a data mining and security sharing platform and the data comes from public data sets, such as UCBI. The public data set (NCBI database) was used as the experimental data, and the data collection time was January 2020.Results: We designed and developed a query language that provides functions for medical data mining, sharing, and privacy preservation. We theoretically proved the performance of the PQL framework. The experimental results show that the PQL framework can ensure the security of each mining result and the availability of the output results is above 97%.Conclusion: Our framework enables medical data providers to securely share health data or treatment data and develops a usable query language, based on a differential privacy mechanism, that enables researchers to mine information securely using data mining algorithms.