Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in...Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning.展开更多
针对传统人体部位体型分类方法费时费力、成本较高的问题,设计一种融合注意力机制的体型分类网络(Attention Body Classification Net,A_BCN)。该网络由弱监督的注意力学习和数据增强两个模块组成,其中:弱监督的注意力学习模块通过注意...针对传统人体部位体型分类方法费时费力、成本较高的问题,设计一种融合注意力机制的体型分类网络(Attention Body Classification Net,A_BCN)。该网络由弱监督的注意力学习和数据增强两个模块组成,其中:弱监督的注意力学习模块通过注意力机制获得注意力图;数据增强模块通过注意力图指导图像的数据增强,包括注意力裁剪、注意力丢弃和注意力平均。将增强后的图像重新输入到网络中得到特征图,将得到的特征图和注意力图融合进行分类。在后续自制的人体图像数据集中,该算法准确率为90.52%,提高了分类准确率并节省了成本。展开更多
随着云医疗体系的应用与发展,如何确保电子健康记录(Electronic Health Record,EHR)数据的安全和有效共享成为了一个关键问题。为了解决这一问题,提出了一个支持细粒度搜索、属性撤销和外包加解密的EHR数据共享方案。该方案优势体现在:...随着云医疗体系的应用与发展,如何确保电子健康记录(Electronic Health Record,EHR)数据的安全和有效共享成为了一个关键问题。为了解决这一问题,提出了一个支持细粒度搜索、属性撤销和外包加解密的EHR数据共享方案。该方案优势体现在:首先,使用密文策略属性基加密技术,让患者可以对自己的EHR数据具有完全掌控的能力,实现EHR数据的细粒度共享;其次,添加属性撤销功能可以及时有效地确保患者的隐私安全;再次,将可搜索技术和属性基加密技术结合,可以实现更细粒度的搜索功能;最后,将与属性相关的部分加密和解密计算,以及与关键字密文生成的计算外包给云服务器,可以减少系统用户的计算开销。此外,通过安全分析、性能对比和实验分析表明,该方案在云医疗体系中可以安全、有效地使医疗机构在不侵犯患者隐私的前提下实现EHR数据的共享。展开更多
文摘Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning.
文摘针对传统人体部位体型分类方法费时费力、成本较高的问题,设计一种融合注意力机制的体型分类网络(Attention Body Classification Net,A_BCN)。该网络由弱监督的注意力学习和数据增强两个模块组成,其中:弱监督的注意力学习模块通过注意力机制获得注意力图;数据增强模块通过注意力图指导图像的数据增强,包括注意力裁剪、注意力丢弃和注意力平均。将增强后的图像重新输入到网络中得到特征图,将得到的特征图和注意力图融合进行分类。在后续自制的人体图像数据集中,该算法准确率为90.52%,提高了分类准确率并节省了成本。
文摘随着云医疗体系的应用与发展,如何确保电子健康记录(Electronic Health Record,EHR)数据的安全和有效共享成为了一个关键问题。为了解决这一问题,提出了一个支持细粒度搜索、属性撤销和外包加解密的EHR数据共享方案。该方案优势体现在:首先,使用密文策略属性基加密技术,让患者可以对自己的EHR数据具有完全掌控的能力,实现EHR数据的细粒度共享;其次,添加属性撤销功能可以及时有效地确保患者的隐私安全;再次,将可搜索技术和属性基加密技术结合,可以实现更细粒度的搜索功能;最后,将与属性相关的部分加密和解密计算,以及与关键字密文生成的计算外包给云服务器,可以减少系统用户的计算开销。此外,通过安全分析、性能对比和实验分析表明,该方案在云医疗体系中可以安全、有效地使医疗机构在不侵犯患者隐私的前提下实现EHR数据的共享。