The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
Information systems are one of the most rapidly changing and vulnerable systems, where security is a major issue. The number of security-breaking attempts originating inside organizations is increasing steadily. Attac...Information systems are one of the most rapidly changing and vulnerable systems, where security is a major issue. The number of security-breaking attempts originating inside organizations is increasing steadily. Attacks made in this way, usually done by "authorized" users of the system, cannot be immediately traced. Because the idea of filtering the traffic at the entrance door, by using firewalls and the like, is not completely successful, the use of intrusion detection systems should be considered to increase the defense capacity of an information system. An intrusion detection system (IDS) is usually working in a dynamically changing environment, which forces continuous tuning of the intrusion detection model, in order to maintain sufficient performance. The manual tuning process required by current IDS depends on the system operators in working out the tuning solution and in integrating it into the detection model. Furthermore, an extensive effort is required to tackle the newly evolving attacks and a deep study is necessary to categorize it into the respective classes. To reduce this dependence, an automatically evolving anomaly IDS using neuro-genetic algorithm is presented. The proposed system automatically tunes the detection model on the fly according to the feedback provided by the system operator when false predictions are encountered. The system has been evaluated using the Knowledge Discovery in Databases Conference (KDD 2009) intrusion detection dataset. Genetic paradigm is employed to choose the predominant features, which reveal the occurrence of intrusions. The neuro-genetic IDS (NGIDS) involves calculation of weightage value for each of the categorical attributes so that data of uniform representation can be processed by the neuro-genetic algorithm. In this system unauthorized invasion of a user are identified and newer types of attacks are sensed and classified respectively by the neuro-genetic algorithm. The experimental results obtained in this work show that the system achieves improvement in terms of misclassification cost when compared with conventional IDS. The results of the experiments show that this system can be deployed based on a real network or database environment for effective prediction of both normal attacks and new attacks.展开更多
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and s...It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.展开更多
We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers inclu...We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.展开更多
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
文摘Information systems are one of the most rapidly changing and vulnerable systems, where security is a major issue. The number of security-breaking attempts originating inside organizations is increasing steadily. Attacks made in this way, usually done by "authorized" users of the system, cannot be immediately traced. Because the idea of filtering the traffic at the entrance door, by using firewalls and the like, is not completely successful, the use of intrusion detection systems should be considered to increase the defense capacity of an information system. An intrusion detection system (IDS) is usually working in a dynamically changing environment, which forces continuous tuning of the intrusion detection model, in order to maintain sufficient performance. The manual tuning process required by current IDS depends on the system operators in working out the tuning solution and in integrating it into the detection model. Furthermore, an extensive effort is required to tackle the newly evolving attacks and a deep study is necessary to categorize it into the respective classes. To reduce this dependence, an automatically evolving anomaly IDS using neuro-genetic algorithm is presented. The proposed system automatically tunes the detection model on the fly according to the feedback provided by the system operator when false predictions are encountered. The system has been evaluated using the Knowledge Discovery in Databases Conference (KDD 2009) intrusion detection dataset. Genetic paradigm is employed to choose the predominant features, which reveal the occurrence of intrusions. The neuro-genetic IDS (NGIDS) involves calculation of weightage value for each of the categorical attributes so that data of uniform representation can be processed by the neuro-genetic algorithm. In this system unauthorized invasion of a user are identified and newer types of attacks are sensed and classified respectively by the neuro-genetic algorithm. The experimental results obtained in this work show that the system achieves improvement in terms of misclassification cost when compared with conventional IDS. The results of the experiments show that this system can be deployed based on a real network or database environment for effective prediction of both normal attacks and new attacks.
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
文摘It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.
基金Supported by the National Natural Science Foundation of China( No.6 0 1330 10 ,No.70 0 710 42 ,No.6 0 0 730 43) andNational Laboratory for Parallel and Distributed Processing
文摘We introduced the work on parallel problem solvers from physics and biology being developed by the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results on parallel solvers include the following areas: Evolutionary algorithms based on imitating the evolution processes of nature for parallel problem solving, especially for parallel optimization and model-building; Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogies such as elastic relaxation process and annealing process, for scientific computations, especially for solving nonlinear mathematical physics problems. All these algorithms have the following common characteristics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solvers are from imitating the natural evolutionary processes.