Using quantum algorithms to solve various problems has attracted widespread attention with the development of quantum computing.Researchers are particularly interested in using the acceleration properties of quantum a...Using quantum algorithms to solve various problems has attracted widespread attention with the development of quantum computing.Researchers are particularly interested in using the acceleration properties of quantum algorithms to solve NP-complete problems.This paper focuses on the well-known NP-complete problem of finding the minimum dominating set in undirected graphs.To expedite the search process,a quantum algorithm employing Grover’s search is proposed.However,a challenge arises from the unknown number of solutions for the minimum dominating set,rendering direct usage of original Grover’s search impossible.Thus,a swap test method is introduced to ascertain the number of iterations required.The oracle,diffusion operators,and swap test are designed with achievable quantum gates.The query complexity is O(1.414^(n))and the space complexity is O(n).To validate the proposed approach,qiskit software package is employed to simulate the quantum circuit,yielding the anticipated results.展开更多
The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By takin...The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.展开更多
This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the L...This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.展开更多
The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for ...The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
As for the satellite remote sensing data obtained by the visible and infrared bands inversion, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and th...As for the satellite remote sensing data obtained by the visible and infrared bands inversion, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds di?cult to be detected would cause the data of the inversion products to be abnormal. Alvera et al.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn’t process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by an- alyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82°C. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.展开更多
A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial partic...A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.展开更多
This paper proposes a long-term forecasting scheme and implementation method based on the interval type-2 fuzzy sets theory for traffic flow data. The type-2 fuzzy sets have advantages in modeling uncertainties becaus...This paper proposes a long-term forecasting scheme and implementation method based on the interval type-2 fuzzy sets theory for traffic flow data. The type-2 fuzzy sets have advantages in modeling uncertainties because their membership functions are fuzzy. The scheme includes traffic flow data preprocessing module, type-2 fuzzification operation module and long-term traffic flow data forecasting output module, in which the Interval Approach acts as the core algorithm. The central limit theorem is adopted to convert point data of mass traffic flow in some time range into interval data of the same time range(also called confidence interval data) which is being used as the input of interval approach. The confidence interval data retain the uncertainty and randomness of traffic flow, meanwhile reduce the influence of noise from the detection data. The proposed scheme gets not only the traffic flow forecasting result but also can show the possible range of traffic flow variation with high precision using upper and lower limit forecasting result. The effectiveness of the proposed scheme is verified using the actual sample application.展开更多
Decoy state method quantum key distribution (QKD) is one of the promising practical solutions for BB84QKD with coherent light pulses.The number of data-set size in practical QKD protocol is always finite,which will ca...Decoy state method quantum key distribution (QKD) is one of the promising practical solutions for BB84QKD with coherent light pulses.The number of data-set size in practical QKD protocol is always finite,which will causestatistical fluctuations.In this paper,we apply absolutely statistical fluctuation to amend the yield and error rate of thequantum state.The relationship between exchanged number of quantum signals and key generation rate is analyzed inour simulation,which offers a useful reference for experiment.展开更多
An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part ...An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.展开更多
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets...Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.展开更多
In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI...In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.展开更多
In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same si...In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.展开更多
Design of control strategies for gene regulatory networks is a challenging and important topic in systems biology. In this paper, the problem of finding both a minimum set of control nodes (control inputs) and a contr...Design of control strategies for gene regulatory networks is a challenging and important topic in systems biology. In this paper, the problem of finding both a minimum set of control nodes (control inputs) and a controller is studied. A control node corresponds to a gene that expression can be controlled. Here, a Boolean network is used as a model of gene regulatory networks, and control specifications on attractors, which represent cell types or states of cells, are imposed. It is important to design a gene regulatory network that has desired attractors and has no undesired attractors. Using a matrix-based representation of BNs, this problem can be rewritten as an integer linear programming problem. Finally, the proposed method is demonstrated by a numerical example on a WNT5A network, which is related to melanoma.展开更多
Arctic region is experiencing strong warming and related changes in the state of sea ice, permafrost, tundra, marine environment and terrestrial ecosystems. These changes are found in any climatological data set compr...Arctic region is experiencing strong warming and related changes in the state of sea ice, permafrost, tundra, marine environment and terrestrial ecosystems. These changes are found in any climatological data set comprising the Arctic region. This study compares the temperature trends in several surface, satellite and reanalysis data sets. We demonstrate large differences in the 1979-2002 temperature trends. Data sets disagree on the magnitude of the trends as well as on their seasonal, zonal and vertical pattern. It was found that the surface temperature trends are stronger than the trends in the tropospheric temperature for each latitude band north of 50?N for each month except for the months during the ice-melting season. These results emphasize that the conclusions of climate studies drawn on the basis of a single data set analysis should be treated with caution as they may be affected by the artificial biases in data.展开更多
With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has be...With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.展开更多
Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in d...Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.展开更多
基金Project supported by the National Natural Science Foundation of China(Grant No.62101600)the Science Foundation of China University of Petroleum,Beijing(Grant No.2462021YJRC008)the State Key Laboratory of Cryptology(Grant No.MMKFKT202109).
文摘Using quantum algorithms to solve various problems has attracted widespread attention with the development of quantum computing.Researchers are particularly interested in using the acceleration properties of quantum algorithms to solve NP-complete problems.This paper focuses on the well-known NP-complete problem of finding the minimum dominating set in undirected graphs.To expedite the search process,a quantum algorithm employing Grover’s search is proposed.However,a challenge arises from the unknown number of solutions for the minimum dominating set,rendering direct usage of original Grover’s search impossible.Thus,a swap test method is introduced to ascertain the number of iterations required.The oracle,diffusion operators,and swap test are designed with achievable quantum gates.The query complexity is O(1.414^(n))and the space complexity is O(n).To validate the proposed approach,qiskit software package is employed to simulate the quantum circuit,yielding the anticipated results.
文摘The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.
文摘This paper presents a generalized method for updating approximations of a concept incrementally, which can be used as an effective tool to deal with dynamic attribute generalization. By combining this method and the LERS inductive learning algorithm, it also introduces a generalized quasi incremental algorithm for learning classification rules from data bases.
基金Supported by the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory(Guangzhou)(No.GML2019ZD0208)the National Natural Science Foundation of China(No.41606030)+1 种基金the Science and Technology Program of Guangzhou(No.202102080363)the China Geological Survey projects(Nos.DD20190212,DD20190216)。
文摘The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
基金Supported by National Natural Science Foundation of China(60675039)National High Technology Research and Development Program of China(863 Program)(2006AA04Z217)Hundred Talents Program of Chinese Academy of Sciences
基金The National Natural Science Foundation of China under contract Nos 40576080 and 40506036 the National"863" Project of China under contract No 2007AA12Z182
文摘As for the satellite remote sensing data obtained by the visible and infrared bands inversion, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds di?cult to be detected would cause the data of the inversion products to be abnormal. Alvera et al.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn’t process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by an- alyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82°C. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.
文摘A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE.
基金supported by the Fundamental Research Funds for the Central Universities(2014JBM007)
文摘This paper proposes a long-term forecasting scheme and implementation method based on the interval type-2 fuzzy sets theory for traffic flow data. The type-2 fuzzy sets have advantages in modeling uncertainties because their membership functions are fuzzy. The scheme includes traffic flow data preprocessing module, type-2 fuzzification operation module and long-term traffic flow data forecasting output module, in which the Interval Approach acts as the core algorithm. The central limit theorem is adopted to convert point data of mass traffic flow in some time range into interval data of the same time range(also called confidence interval data) which is being used as the input of interval approach. The confidence interval data retain the uncertainty and randomness of traffic flow, meanwhile reduce the influence of noise from the detection data. The proposed scheme gets not only the traffic flow forecasting result but also can show the possible range of traffic flow variation with high precision using upper and lower limit forecasting result. The effectiveness of the proposed scheme is verified using the actual sample application.
基金Supported by the National Basic Research Program (973) of China under Grant No.2010CB923200Chinese Universities Scientific Fund BUPT2009RC0709
文摘Decoy state method quantum key distribution (QKD) is one of the promising practical solutions for BB84QKD with coherent light pulses.The number of data-set size in practical QKD protocol is always finite,which will causestatistical fluctuations.In this paper,we apply absolutely statistical fluctuation to amend the yield and error rate of thequantum state.The relationship between exchanged number of quantum signals and key generation rate is analyzed inour simulation,which offers a useful reference for experiment.
基金Sponsored by the National Basic Research Program of China(Grant No. 2006CB403302)the National Education Ministry foundation of China(Grant No.705011)the National Special Science and Technology Program Water Pollution Control and Treatment (Grant No.2009ZX07526-006,2008AX07208-001)
文摘An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.
文摘Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms.
基金the National Key Research and Development Program of China under Grant 2018YFF0301205in part by the National Natural Science Foundation of China under Grant NSFC 61925105 and Grant 61801260.
文摘In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.
文摘In gene prediction, the Fisher discriminant analysis (FDA) is used to separate protein coding region (exon) from non-coding regions (intron). Usually, the positive data set and the negative data set are of the same size if the number of the data is big enough. But for some situations the data are not sufficient or not equal, the threshold used in FDA may have important influence on prediction results. This paper presents a study on the selection of the threshold. The eigen value of each exon/intron sequence is computed using the Z-curve method with 69 variables. The experiments results suggest that the size and the standard deviation of the data sets and the threshold are the three key elements to be taken into consideration to improve the prediction results.
文摘Design of control strategies for gene regulatory networks is a challenging and important topic in systems biology. In this paper, the problem of finding both a minimum set of control nodes (control inputs) and a controller is studied. A control node corresponds to a gene that expression can be controlled. Here, a Boolean network is used as a model of gene regulatory networks, and control specifications on attractors, which represent cell types or states of cells, are imposed. It is important to design a gene regulatory network that has desired attractors and has no undesired attractors. Using a matrix-based representation of BNs, this problem can be rewritten as an integer linear programming problem. Finally, the proposed method is demonstrated by a numerical example on a WNT5A network, which is related to melanoma.
文摘Arctic region is experiencing strong warming and related changes in the state of sea ice, permafrost, tundra, marine environment and terrestrial ecosystems. These changes are found in any climatological data set comprising the Arctic region. This study compares the temperature trends in several surface, satellite and reanalysis data sets. We demonstrate large differences in the 1979-2002 temperature trends. Data sets disagree on the magnitude of the trends as well as on their seasonal, zonal and vertical pattern. It was found that the surface temperature trends are stronger than the trends in the tropospheric temperature for each latitude band north of 50?N for each month except for the months during the ice-melting season. These results emphasize that the conclusions of climate studies drawn on the basis of a single data set analysis should be treated with caution as they may be affected by the artificial biases in data.
基金National Key Research and Development Program of China(2016YFE0122600)。
文摘With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.
文摘Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.