Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and...Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.展开更多
Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s in...Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.展开更多
1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich inf...1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels...As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.展开更多
Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owne...Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.展开更多
As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electron...As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.展开更多
Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems...Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.展开更多
Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its cla...Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.展开更多
Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different a...Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different aims.E-Infrastructures are collaborative platforms that address this diversity of aims and data representations.In this paper,we present a prototype for a European Geothermal Information Platform that uses INSPIRE recommendations and an e-Infrastructure(D4Science)to collect,aggregate and share data sets from different European data contributors,thus enabling stakeholders to retrieve and process a large amount of data.Our system merges segmented and national realities into one common framework.We demonstrate our approach by describing a platform that collects data from Italian,French,Hungarian,Swiss and Icelandic geothermal data providers.展开更多
There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and...There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and Virtual Observatories is available.In addition,national databanks and commercially available large exploration data-sets also exist.These distributed data resources impose challenges for the future to move toward their objective integration and visualization to discover new knowledge.Such advancements can facilitate meaningful interpretations and decision-making for the benefit of society at global and local scales.This article presents the Digital Earth initiative at a national level to address multiple domains,such as effective management of natural resources,interactive planning of exploration activities and monitoring,mapping and mitigation of natural hazards.It discusses a distributed geospatial data infrastructure and its importance in geoscientific data integration for efficient and interactive data retrieval,analysis and visualization.Some examples are presented to demonstrate the advantages of integrated visualization in geoscientific analysis.展开更多
The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges...The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.展开更多
Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this w...Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this work,we first investigated the data generation trends of the K computer by analyzing some operational log data files.We verified a tendency of generating large amounts of distributed files as simulation outputs,and in most cases,the number of files has been proportional to the number of utilized computational nodes,that is,each computational node producing one or more files.Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations,a flexible data input/output(I/O)management mechanism becomes highly useful for the post-hoc visualization and analysis.In this work,we focused on the xDMlib data management library,and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results.In the proposed approach,a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process.We evaluated the proposed approach by using a 32-node visualization cluster,and the K computer.Besides the inevitable performance penalty associated with longer data loading time,when using smaller number of processes,there is a benefit for avoiding any data replication via copy,conversion,or extraction.In addition,users will be able to freely select any number of nodes,without caring about the number of distributed files,for the post-hoc visualization and analysis purposes.展开更多
In distribution simulation based on High-level architecture(HLA),data distribution management(DDM)is one of HLA services for the purpose of filtering the unnecessary data transferring over the network.DDM admits the s...In distribution simulation based on High-level architecture(HLA),data distribution management(DDM)is one of HLA services for the purpose of filtering the unnecessary data transferring over the network.DDM admits the sending federates and the receiving federates to express their interest using update regions and subscription regions in a multidimensional routing space.There are several matching algorithms to obtain overlap information between the update regions and subscription regions.When the number of regions increase sharply,the matching process is time consuming.However,the existing algorithms is hard to be parallelized to take advantage of the computing capabilities of multi-core processors.To reduce the computational overhead of region matching,we propose a parallel algorithm based on order relation to accelerate the matching process.The new matching algorithm adopts divide-and-conquer approach to divide the regions into multiple region bound sublists,each of which comprises parts of region bounds.To calculate the intersection inside and amongst the region bound sublists,two matching rules are presented.This approach has good performance since it performs region matching on the sublists parallel and does not require unnecessary comparisons within regions in different sublists.Theoretical analysis has been carried out for the proposed algorithm and experimental result shows that the proposed algorithm has better performance than major existing DDM matching algorithms.展开更多
The influence of non-Independent Identically Distribution(non-IID)data on Federated Learning(FL)has been a serious concern.Clustered Federated Learning(CFL)is an emerging approach for reducing the impact of non-IID da...The influence of non-Independent Identically Distribution(non-IID)data on Federated Learning(FL)has been a serious concern.Clustered Federated Learning(CFL)is an emerging approach for reducing the impact of non-IID data,which employs the client similarity calculated by relevant metrics for clustering.Unfortunately,the existing CFL methods only pursue a single accuracy improvement,but ignore the convergence rate.Additionlly,the designed client selection strategy will affect the clustering results.Finally,traditional semi-supervised learning changes the distribution of data on clients,resulting in higher local costs and undesirable performance.In this paper,we propose a novel CFL method named ASCFL,which selects clients to participate in training and can dynamically adjust the balance between accuracy and convergence speed with datasets consisting of labeled and unlabeled data.To deal with unlabeled data,the prediction labels strategy predicts labels by encoders.The client selection strategy is to improve accuracy and reduce overhead by selecting clients with higher losses participating in the current round.What is more,the similarity-based clustering strategy uses a new indicator to measure the similarity between clients.Experimental results show that ASCFL has certain advantages in model accuracy and convergence speed over the three state-of-the-art methods with two popular datasets.展开更多
The launching of CBERS-01(China Brazil Earth Resource Satellite)in 1999,China’s first land observation satellite,signifies an unprecedented milestone in Chinese satellite remote sensing history.Since then,a large num...The launching of CBERS-01(China Brazil Earth Resource Satellite)in 1999,China’s first land observation satellite,signifies an unprecedented milestone in Chinese satellite remote sensing history.Since then,a large number of applications have been developed that drew upon solely CBERS-01 and other Chinese land observation satellites.The application development evolves from one satellite to multiple satellites,from one series of satellites to multiple series,from scientific research to industrial applications.Six aspects of the Chinese land observation satellite program are discussed in this paper:development status,data sharing and distribution,satellite calibration,industrial data applications,future prospects,and conclusion.展开更多
This paper presents various limitations of the current remote sensing data distribution models and proposes a new concept called the location-based instant satellite image service for a new generation of remote sensin...This paper presents various limitations of the current remote sensing data distribution models and proposes a new concept called the location-based instant satellite image service for a new generation of remote sensing image distribution system.The essential feature of the service is that customers can subscribe to data based on the location of interest and satellite image data received by antenna will be distributed to customer’s terminal devices instantly after imaging over the subscribed area.The workflow,architecture,and key technologies of the new generation data distribution system are described.The system is composed of four parts:data comprehensive processing component,data management component,product distribution component,and data display component.Based on this,a prototype system is developed,which demonstrates the promising service model with great potential for increased usage in many applications.展开更多
The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data dist...The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data distribution as a combination of multiple interrelated elements of two or more data components that can be represented and treated as a unified whole.Our theoretical model describes how patterns are made by relationships existing between data elements.Knowing the types of these relationships,it is possible to predict what kinds of patterns may exist.We demonstrate how our model underpins and refines the established fundamental principles of visualisation.The model also suggests a range of interactive analytical operations that can support visual analytics workflows where patterns,once discovered,are explicitly involved in further data analysis.展开更多
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.
文摘Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.
基金supported by the Special Scientific Research Fund of Public Welfare Profession of Ministry of Land and Resources of the People’s Republic of China (No. 201011057)
文摘1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.
基金supported by the National Natural Science Foundation of China(Nos.U19A2059 and 61802050)Ministry of Science and Technology of Sichuan Province Program(Nos.2021YFG0018 and 20ZDYF0343)。
文摘Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.
基金This work was supported in part by the National Natural Science Foundation of China(CCF1919154,ECCS-1923409).
文摘As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.
文摘Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.
基金supported by the[National Key Research and Development Program]under Grant[number 2019YFE0126700][Shandong Provincial Natural Science Foundation]under Grant[number ZR2020QD018].
文摘Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.
文摘Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different aims.E-Infrastructures are collaborative platforms that address this diversity of aims and data representations.In this paper,we present a prototype for a European Geothermal Information Platform that uses INSPIRE recommendations and an e-Infrastructure(D4Science)to collect,aggregate and share data sets from different European data contributors,thus enabling stakeholders to retrieve and process a large amount of data.Our system merges segmented and national realities into one common framework.We demonstrate our approach by describing a platform that collects data from Italian,French,Hungarian,Swiss and Icelandic geothermal data providers.
文摘There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and Virtual Observatories is available.In addition,national databanks and commercially available large exploration data-sets also exist.These distributed data resources impose challenges for the future to move toward their objective integration and visualization to discover new knowledge.Such advancements can facilitate meaningful interpretations and decision-making for the benefit of society at global and local scales.This article presents the Digital Earth initiative at a national level to address multiple domains,such as effective management of natural resources,interactive planning of exploration activities and monitoring,mapping and mitigation of natural hazards.It discusses a distributed geospatial data infrastructure and its importance in geoscientific data integration for efficient and interactive data retrieval,analysis and visualization.Some examples are presented to demonstrate the advantages of integrated visualization in geoscientific analysis.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.
基金the“Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures”in Japan(Project ID:jh170043,jh170051).
文摘Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this work,we first investigated the data generation trends of the K computer by analyzing some operational log data files.We verified a tendency of generating large amounts of distributed files as simulation outputs,and in most cases,the number of files has been proportional to the number of utilized computational nodes,that is,each computational node producing one or more files.Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations,a flexible data input/output(I/O)management mechanism becomes highly useful for the post-hoc visualization and analysis.In this work,we focused on the xDMlib data management library,and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results.In the proposed approach,a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process.We evaluated the proposed approach by using a 32-node visualization cluster,and the K computer.Besides the inevitable performance penalty associated with longer data loading time,when using smaller number of processes,there is a benefit for avoiding any data replication via copy,conversion,or extraction.In addition,users will be able to freely select any number of nodes,without caring about the number of distributed files,for the post-hoc visualization and analysis purposes.
文摘In distribution simulation based on High-level architecture(HLA),data distribution management(DDM)is one of HLA services for the purpose of filtering the unnecessary data transferring over the network.DDM admits the sending federates and the receiving federates to express their interest using update regions and subscription regions in a multidimensional routing space.There are several matching algorithms to obtain overlap information between the update regions and subscription regions.When the number of regions increase sharply,the matching process is time consuming.However,the existing algorithms is hard to be parallelized to take advantage of the computing capabilities of multi-core processors.To reduce the computational overhead of region matching,we propose a parallel algorithm based on order relation to accelerate the matching process.The new matching algorithm adopts divide-and-conquer approach to divide the regions into multiple region bound sublists,each of which comprises parts of region bounds.To calculate the intersection inside and amongst the region bound sublists,two matching rules are presented.This approach has good performance since it performs region matching on the sublists parallel and does not require unnecessary comparisons within regions in different sublists.Theoretical analysis has been carried out for the proposed algorithm and experimental result shows that the proposed algorithm has better performance than major existing DDM matching algorithms.
基金supported by the National Key Research and Development Program of China(No.2019YFC1520904)the National Natural Science Foundation of China(No.61973250).
文摘The influence of non-Independent Identically Distribution(non-IID)data on Federated Learning(FL)has been a serious concern.Clustered Federated Learning(CFL)is an emerging approach for reducing the impact of non-IID data,which employs the client similarity calculated by relevant metrics for clustering.Unfortunately,the existing CFL methods only pursue a single accuracy improvement,but ignore the convergence rate.Additionlly,the designed client selection strategy will affect the clustering results.Finally,traditional semi-supervised learning changes the distribution of data on clients,resulting in higher local costs and undesirable performance.In this paper,we propose a novel CFL method named ASCFL,which selects clients to participate in training and can dynamically adjust the balance between accuracy and convergence speed with datasets consisting of labeled and unlabeled data.To deal with unlabeled data,the prediction labels strategy predicts labels by encoders.The client selection strategy is to improve accuracy and reduce overhead by selecting clients with higher losses participating in the current round.What is more,the similarity-based clustering strategy uses a new indicator to measure the similarity between clients.Experimental results show that ASCFL has certain advantages in model accuracy and convergence speed over the three state-of-the-art methods with two popular datasets.
基金supported in part by the National Basic Research Program of China(973 Program,Nos.2014CB744201 and 2012CB719902)the Program for New Century Excellent Talents in University+2 种基金the National High Technology Research and Development Program of China(No.2011AA120203)the National Natural Science Foundation of China(No.41371430)the Program for Changjiang Scholars and Innovative Research Team in University under Grant IRT1278.
文摘The launching of CBERS-01(China Brazil Earth Resource Satellite)in 1999,China’s first land observation satellite,signifies an unprecedented milestone in Chinese satellite remote sensing history.Since then,a large number of applications have been developed that drew upon solely CBERS-01 and other Chinese land observation satellites.The application development evolves from one satellite to multiple satellites,from one series of satellites to multiple series,from scientific research to industrial applications.Six aspects of the Chinese land observation satellite program are discussed in this paper:development status,data sharing and distribution,satellite calibration,industrial data applications,future prospects,and conclusion.
文摘This paper presents various limitations of the current remote sensing data distribution models and proposes a new concept called the location-based instant satellite image service for a new generation of remote sensing image distribution system.The essential feature of the service is that customers can subscribe to data based on the location of interest and satellite image data received by antenna will be distributed to customer’s terminal devices instantly after imaging over the subscribed area.The workflow,architecture,and key technologies of the new generation data distribution system are described.The system is composed of four parts:data comprehensive processing component,data management component,product distribution component,and data display component.Based on this,a prototype system is developed,which demonstrates the promising service model with great potential for increased usage in many applications.
基金This research was supported by Fraunhofer Center for Machine Learning within the Fraunhofer Cluster for Cognitive Internet Technologiesby DFG within Priority Programme 1894(SPP VGI)+2 种基金by EU in project SoBigData++by SESAR in projects TAPAS and SIMBADby Austrian Science Fund(FWF)project KnowVA(grant P31419-N31).
文摘The word‘pattern’frequently appears in the visualisation and visual analytics literature,but what do we mean when we talk about patterns?We propose a practicable definition of the concept of a pattern in a data distribution as a combination of multiple interrelated elements of two or more data components that can be represented and treated as a unified whole.Our theoretical model describes how patterns are made by relationships existing between data elements.Knowing the types of these relationships,it is possible to predict what kinds of patterns may exist.We demonstrate how our model underpins and refines the established fundamental principles of visualisation.The model also suggests a range of interactive analytical operations that can support visual analytics workflows where patterns,once discovered,are explicitly involved in further data analysis.