Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while ...Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.展开更多
Objective:To analyze the rule of prescribing traditional Chinese medicine for treating pneumoconiosis,so as to provide reference for differential diagnosis and treatment of pneumoconiosis as well as for the developmen...Objective:To analyze the rule of prescribing traditional Chinese medicine for treating pneumoconiosis,so as to provide reference for differential diagnosis and treatment of pneumoconiosis as well as for the development of new drugs for treatingthe disease.Methods:We searched China National Knowledge Infrastructure,Wanfang Database and VIP Chinese PublicationDatabase to retrieve relevant literatures which were then screened according to the enrollment criteria to establish a prescriptiondatabase of traditional Chinese medicine for the treatment of pneumoconiosis.The inheritance calculation platform of traditionalChinese medicine was used to analyze the prescribing rule of traditional Chinese medicine in the treatment of pneumoconiosisbased on association rules,k-means clustering algorithm and regression model analysis.Results:A total of 131 related literature were preliminarily selected,from which 97 prescriptions of traditional Chinese medicine with a total of 195 herbs were included.The most frequently prescribed herbs included Radix astragali,Platycodon grandiflorum,Pinellia ternata,licorice,Codonopsispilosula,Salvia miltiorrhiza,bitter almond etc.A total of 14 association rules,13 high-frequency herb pairs were found and 5groups of formulas were revealed by cluster analysis.Conclusion:The prescriptions for the treatment of pneumoconiosis are mainly composed of herbs for tonifying deficiency,resolving phlegm,relieving cough and asthma,activating blood circulation and removingblood stasis,which are supplemented with herbs for clearing heat,relieving appearance,regulating qi,promoting waterand permeating dampness,etc.,The prescribing rules reflect the basic pathological characteristics of lung deficiency and collateral arthralgia in pneumoconiosis,which provides some ideas for the clinical differentiation and treatment of pneumoconiosis in traditionalChinese medicine.It also provides reference for the research and development of new treatment methods.展开更多
[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal pa...[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.展开更多
Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of th...Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.展开更多
As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the tr...As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.展开更多
Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of informatio...Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of information visualization to enable users to gain an understanding of high dimensional and large-scale data can play a major role in the exploration, identification, and interpretation of association rules. In this paper, we propose a method that provides multiple views of the association rules, linked together through a filtering mechanism. A visual inspection of the entire association rule set is enabled within a matrix view. Items of interest can be selected, resulting in their corresponding association rules being shown in a graph view. At any time, individual rules can be selected in either view, resulting in their information being shown in the detail view. The fundamental premise in this work is that by providing such a visual and interactive representation of the association rules, users will be able to find important rules quickly and easily, even as the number of rules that must be inspected becomes large. A user evaluation was conducted which validates this premise.展开更多
Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations i...Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations is great use of better identifying trends in associations and forecast-ing.Because cyclic rules are quite sensitive to a littlenoise,this paper uses the noise-ratio as the criterion of i-dentifing cydclic itemsets for dealing with the problem and utilizes the cycle-pruning technique to reduce the comput-ing time of the data mining process by exploiting the real-tionship between the cycle and generalized frequent item-sets.The paper gives the algorithm of mining cyclic gen-eralized itemsets(CGI).Experiment shows that the CGI algorithm can efficiently yield results.展开更多
Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results conta...Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.展开更多
Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at...Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.展开更多
The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table techni...The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.展开更多
The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates ...The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.展开更多
We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algor...We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algorithm of association rules and apply to detect anomaly in IDS. Moreover, according to the characteristic of intrusion detection, we optimize the mining algorithm of association rules, and use fuzzy logic to improve the system performance.展开更多
Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence w...Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.展开更多
The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the...The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.展开更多
This paper devises a scheme which can discover the state association rules of process object. The scheme aims to dig the hidden close relationships of different links in process object. We adopt a method based on diff...This paper devises a scheme which can discover the state association rules of process object. The scheme aims to dig the hidden close relationships of different links in process object. We adopt a method based on difference and extremum to compute the timing. Clustering is used to classifying the adjusted data, and the next is associating the clusters. Based on the rules of clusters, we produce the rules of links. Association degrees between each two links can be determined. It is easy to get association chains according to the degree. The state association rules that can be obtained in accordance with association rules are the final results. Some industry guidance can be directly summarized from the state association rules, and we can apply the guidance to improve the efficiency of production and operational in allied industries.展开更多
These days, health care systems such as pharmacies and drugstores normally produce high volumes of data. Consequently, utilizing data mining methods in health care systems has become a conventional process. In this re...These days, health care systems such as pharmacies and drugstores normally produce high volumes of data. Consequently, utilizing data mining methods in health care systems has become a conventional process. In this research, Apriori algorithm has been applied to perform data mining using the data obtained from the prescriptions ordered within a pharmacy. Ten association rules were achieved from the assigned pharmaceutical drugs in those prescriptions using the aforementioned Apriori algorithm. The accuracy of these rules is also manually studied and reviewed by a physician. Among these association rules, Vitamin D and Calcium pills are the most interrelated medications, and Omeprazole and Metronidazole rankd second in terms of association. The results of this study provide useful feedback information about associations among drugs.展开更多
Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain a...Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.展开更多
In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidential...In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.展开更多
Frequent item sets mining plays an important role in association rules mining. A variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Although many techniques w...Frequent item sets mining plays an important role in association rules mining. A variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Although many techniques were proposed for maintenance of the discovered rules when new transactions are added, little work is done for maintaining the discovered rules when some transactions are deleted from the database. Updates are fundamental aspect of data management. In this paper, a decremental association rules mining algorithm is present for updating the discovered association rules when some transactions are removed from the original data set. Extensive experiments were conducted to evaluate the performance of the proposed algorithm. The results show that the proposed algorithm is efficient and outperforms other well-known algorithms.展开更多
To date, not many studies have been conducted on criminal prediction. In this study, the criminal data related to city S is divided into a training data set and a validation data set at a 1:1 ratio in light of the per...To date, not many studies have been conducted on criminal prediction. In this study, the criminal data related to city S is divided into a training data set and a validation data set at a 1:1 ratio in light of the personal tag data and the travel and accommodation data of criminals and ordinary people in city S. Firstly, the FP-growth algorithm is adopted to calculate association rules between the criminals and the ordinary people in their travel and hotel accommodation data, in order to discover criminal suspects based on association rules. Secondly, the DBSCAN algorithm is employed for clustering of the tag data of the criminals and the ordinary people, followed by similarity calculation, in order to discover criminal suspects based on tag clustering. Lastly, intersection operation is performed on the above two sets of criminal suspects, and the resulting intersection is verified against the criminal validation set for elimination of criminals who appear in the intersection so as to obtain final criminal suspects. Results show that a set of 648 criminal suspects is retrieved based on the association rules calculated by the FP-growth algorithm, while a set of 973 criminal suspects is retrieved based on DBSCAN clustering and cosine similarity of the personal tags;the number of criminal suspects is narrowed down to 567 after the intersection operation of the two sets, and 419 of the 567 criminal suspects are further verified to be criminals using the validation set, thereby leaving the other 148 to be the final criminal suspects and giving a prediction accuracy of 73.9%. The data mining method of criminal suspects based on association rules and tag clustering in this study has been successfully applied to the police system of city S, and the experiment proves the effectiveness of this method in detecting criminal suspects.展开更多
基金The APC was funded by the Deanship of Scientific Research,Saudi Electronic University.
文摘Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.
基金General Project of National Natural Science Foundation of China(No.81573970)BeijingMunicipal Natural Science Foundation(No.7202118)。
文摘Objective:To analyze the rule of prescribing traditional Chinese medicine for treating pneumoconiosis,so as to provide reference for differential diagnosis and treatment of pneumoconiosis as well as for the development of new drugs for treatingthe disease.Methods:We searched China National Knowledge Infrastructure,Wanfang Database and VIP Chinese PublicationDatabase to retrieve relevant literatures which were then screened according to the enrollment criteria to establish a prescriptiondatabase of traditional Chinese medicine for the treatment of pneumoconiosis.The inheritance calculation platform of traditionalChinese medicine was used to analyze the prescribing rule of traditional Chinese medicine in the treatment of pneumoconiosisbased on association rules,k-means clustering algorithm and regression model analysis.Results:A total of 131 related literature were preliminarily selected,from which 97 prescriptions of traditional Chinese medicine with a total of 195 herbs were included.The most frequently prescribed herbs included Radix astragali,Platycodon grandiflorum,Pinellia ternata,licorice,Codonopsispilosula,Salvia miltiorrhiza,bitter almond etc.A total of 14 association rules,13 high-frequency herb pairs were found and 5groups of formulas were revealed by cluster analysis.Conclusion:The prescriptions for the treatment of pneumoconiosis are mainly composed of herbs for tonifying deficiency,resolving phlegm,relieving cough and asthma,activating blood circulation and removingblood stasis,which are supplemented with herbs for clearing heat,relieving appearance,regulating qi,promoting waterand permeating dampness,etc.,The prescribing rules reflect the basic pathological characteristics of lung deficiency and collateral arthralgia in pneumoconiosis,which provides some ideas for the clinical differentiation and treatment of pneumoconiosis in traditionalChinese medicine.It also provides reference for the research and development of new treatment methods.
基金Supported by Science and Technology Research and Development Project of Chengde City,Hebei Province(201706A043)Young Scholar Program of Hebei Pharmaceutical Association Hospital Pharmaceutical Research Project(2020—Hbsyxhqn0029).
文摘[Objectives]To explore the compatibility rules of neonatal parenteral nutrition(PN)prescriptions based on association rules and hierarchical cluster analysis,thereby providing a reference for standardizing neonatal parenteral nutrition supportive therapy.[Methods]The data about neonatal PN formulations prepared by the Pharmacy Intravenous Admixture Services(PIVAS)of the Affiliated Hospital of Chengde Medical University from July 2015 to June 2021 were collected.The general information of the prescriptions and the frequency of drug use were analyzed with Excel 2019;the boxplot of drug dosing was drawn using GraphPad 8.0 software;and SPSS Modeler 18.0 and SPSS Statistics 26.0 were used to perform association rules and hierarchical cluster analysis.[Results]A total of 11488 PN prescriptions were collected from 1421 newborns,involving 18 kinds of drugs,which were divided into 11 types of nutrients.Association rules analysis yielded 84 nutrient substance combinations.The combination of fat emulsion-water-soluble vitamins-fat-soluble vitamins-glucose-amino acids had the highest confidence(99.95%).The hierarchical cluster analysis divided nutrients into 5 types.[Conclusions]The prescriptions of PN for newborns were composed of five types of nutrients:amino acids,fat emulsion,glucose,water-soluble vitamins,and fat-soluble vitamins.According to the lack of electrolytes and trace elements,appropriate drugs can be chosen to meet nutritional demands.This study provides reference basis for reasonable selection of drugs for neonatal PN prescriptions and further standardization of PN supportive therapy in newborns.
文摘Data mining techniques offer great opportunities for developing ethics lines whose main aim is to ensure improvements and compliance with the values, conduct and commitments making up the code of ethics. The aim of this study is to suggest a process for exploiting the data generated by the data generated and collected from an ethics line by extracting rules of association and applying the Apriori algorithm. This makes it possible to identify anomalies and behaviour patterns requiring action to review, correct, promote or expand them, as appropriate.
文摘As data mining more and more popular applied in computer system,the quality as-surance test of its software would be get more and more attention.However,because of the ex-istence of the 'oracle' problem,the traditional test method is not ease fit for the application program in the field of the data mining.In this paper,based on metamorphic testing,a software testing method is proposed in the field of the data mining,makes an association rules algorithm as the specific case,and constructs the metamorphic relation on the algorithm.Experiences show that the method can achieve the testing target and is feasible to apply to other domain.
文摘Although association rule mining is an important pattern recognition and data analysis technique, extracting and finding significant rules from a large collection has always been challenging. The ability of information visualization to enable users to gain an understanding of high dimensional and large-scale data can play a major role in the exploration, identification, and interpretation of association rules. In this paper, we propose a method that provides multiple views of the association rules, linked together through a filtering mechanism. A visual inspection of the entire association rule set is enabled within a matrix view. Items of interest can be selected, resulting in their corresponding association rules being shown in a graph view. At any time, individual rules can be selected in either view, resulting in their information being shown in the detail view. The fundamental premise in this work is that by providing such a visual and interactive representation of the association rules, users will be able to find important rules quickly and easily, even as the number of rules that must be inspected becomes large. A user evaluation was conducted which validates this premise.
文摘Discovering cyclic generalized association rules from transaction datbases can reveal the relationship of differ-ent levels of the taxonomies and display cyclic variations over time.Information about such variations is great use of better identifying trends in associations and forecast-ing.Because cyclic rules are quite sensitive to a littlenoise,this paper uses the noise-ratio as the criterion of i-dentifing cydclic itemsets for dealing with the problem and utilizes the cycle-pruning technique to reduce the comput-ing time of the data mining process by exploiting the real-tionship between the cycle and generalized frequent item-sets.The paper gives the algorithm of mining cyclic gen-eralized itemsets(CGI).Experiment shows that the CGI algorithm can efficiently yield results.
基金Under the auspices of Special Fund of Ministry of Land and Resources of China in Public Interest(No.201511001)
文摘Association rule mining methods, as a set of important data mining tools, could be used for mining spatial association rules of spatial data. However, applications of these methods are limited for mining results containing large number of redundant rules. In this paper, a new method named Geo-Filtered Association Rules Mining(GFARM) is proposed to effectively eliminate the redundant rules. An application of GFARM is performed as a case study in which association rules are discovered between building land distribution and potential driving factors in Wuhan, China from 1995 to 2015. Ten sets of regular sampling grids with different sizes are used for detecting the influence of multi-scales on GFARM. Results show that the proposed method can filter 50%–70% of redundant rules. GFARM is also successful in discovering spatial association pattern between building land distribution and driving factors.
文摘Data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Extracting multilevel association rules in transaction databases is most commonly used in data mining. This paper proposes a multilevel fuzzy association rule mining model for extraction of implicit knowledge which stored as quantitative values in transactions. For this reason it uses different support value at each level as well as different membership function for each item. By integrating fuzzy-set concepts, data-mining technologies and multiple-level taxonomy, our method finds fuzzy association rules from transaction data sets. This approach adopts a top-down progressively deepening approach to derive large itemsets and also incorporates fuzzy boundaries instead of sharp boundary intervals. Comparing our method with previous ones in simulation shows that the proposed method maintains higher precision, the mined rules are closer to reality, and it gives ability to mine association rules at different levels based on the user’s tendency as well.
文摘The Apriori algorithm is a classical method of association rules mining.Based on analysis of this theory,the paper provides an improved Apriori algorithm.The paper puts foward with algorithm combines HASH table technique and reduction of candidate item sets to enhance the usage efficiency of resources as well as the individualized service of the data library.
文摘The traditional generalization-based knowledge discovery method is introduced. A new kind of multilevel spatial association of the rules mining method based on the cloud model is presented. The cloud model integrates the vague and random use of linguistic terms in a unified way. With these models, spatial and nonspatial attribute values are well generalized at multiple levels, allowing discovery of strong spatial association rules. Combining the cloud model based method with Apriori algorithms for mining association rules from a spatial database shows benefits in being effective and flexible.
文摘We discuss the basic intrusion detection techniques, and focus on how to apply association rules to intrusion detection. Begin with analyzing some close relations between user’s behaviors, we discuss the mining algorithm of association rules and apply to detect anomaly in IDS. Moreover, according to the characteristic of intrusion detection, we optimize the mining algorithm of association rules, and use fuzzy logic to improve the system performance.
基金Projects(10871031, 60474070) supported by the National Natural Science Foundation of ChinaProject(07A001) supported by the Scientific Research Fund of Hunan Provincial Education Department, China
文摘Based on the rough set theory which is a powerful tool in dealing with vagueness and uncertainty, an algorithm to mine association rules in incomplete information systems was presented and the support and confidence were redefined. The algorithm can mine the association rules with decision attributes directly without processing missing values. Using the incomplete dataset Mushroom from UCI machine learning repository, the new algorithm was compared with the classical association rules mining algorithm based on Apriori from the number of rules extracted, testing accuracy and execution time. The experiment results show that the new algorithm has advantages of short execution time and high accuracy.
基金supported by the National Natural Science Foundation of China (No. J07240003, No. 60773084, No. 60603023)National Research Fund for the Doctoral Program of Higher Education of China (No. 20070151009)
文摘The typical model, which involves the measures: support, confidence, and interest, is often adapted to mining association rules. In the model, the related parameters are usually chosen by experience; consequently, the number of useful rules is hard to estimate. If the number is too large, we cannot effectively extract the meaningful rules. This paper analyzes the meanings of the parameters and designs a variety of equations between the number of rules and the parameters by using regression method. Finally, we experimentally obtain a preferable regression equation. This paper uses multiple correlation coeficients to test the fitting efiects of the equations and uses significance test to verify whether the coeficients of parameters are significantly zero or not. The regression equation that has a larger multiple correlation coeficient will be chosen as the optimally fitted equation. With the selected optimal equation, we can predict the number of rules under the given parameters and further optimize the choice of the three parameters and determine their ranges of values.
文摘This paper devises a scheme which can discover the state association rules of process object. The scheme aims to dig the hidden close relationships of different links in process object. We adopt a method based on difference and extremum to compute the timing. Clustering is used to classifying the adjusted data, and the next is associating the clusters. Based on the rules of clusters, we produce the rules of links. Association degrees between each two links can be determined. It is easy to get association chains according to the degree. The state association rules that can be obtained in accordance with association rules are the final results. Some industry guidance can be directly summarized from the state association rules, and we can apply the guidance to improve the efficiency of production and operational in allied industries.
文摘These days, health care systems such as pharmacies and drugstores normally produce high volumes of data. Consequently, utilizing data mining methods in health care systems has become a conventional process. In this research, Apriori algorithm has been applied to perform data mining using the data obtained from the prescriptions ordered within a pharmacy. Ten association rules were achieved from the assigned pharmaceutical drugs in those prescriptions using the aforementioned Apriori algorithm. The accuracy of these rules is also manually studied and reviewed by a physician. Among these association rules, Vitamin D and Calcium pills are the most interrelated medications, and Omeprazole and Metronidazole rankd second in terms of association. The results of this study provide useful feedback information about associations among drugs.
文摘Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which containeddata on physical environment (land cover, river, road and city center), socio-economic (income source, population, and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors.The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1m/s, 2m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.
文摘In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.
文摘Frequent item sets mining plays an important role in association rules mining. A variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Although many techniques were proposed for maintenance of the discovered rules when new transactions are added, little work is done for maintaining the discovered rules when some transactions are deleted from the database. Updates are fundamental aspect of data management. In this paper, a decremental association rules mining algorithm is present for updating the discovered association rules when some transactions are removed from the original data set. Extensive experiments were conducted to evaluate the performance of the proposed algorithm. The results show that the proposed algorithm is efficient and outperforms other well-known algorithms.
文摘To date, not many studies have been conducted on criminal prediction. In this study, the criminal data related to city S is divided into a training data set and a validation data set at a 1:1 ratio in light of the personal tag data and the travel and accommodation data of criminals and ordinary people in city S. Firstly, the FP-growth algorithm is adopted to calculate association rules between the criminals and the ordinary people in their travel and hotel accommodation data, in order to discover criminal suspects based on association rules. Secondly, the DBSCAN algorithm is employed for clustering of the tag data of the criminals and the ordinary people, followed by similarity calculation, in order to discover criminal suspects based on tag clustering. Lastly, intersection operation is performed on the above two sets of criminal suspects, and the resulting intersection is verified against the criminal validation set for elimination of criminals who appear in the intersection so as to obtain final criminal suspects. Results show that a set of 648 criminal suspects is retrieved based on the association rules calculated by the FP-growth algorithm, while a set of 973 criminal suspects is retrieved based on DBSCAN clustering and cosine similarity of the personal tags;the number of criminal suspects is narrowed down to 567 after the intersection operation of the two sets, and 419 of the 567 criminal suspects are further verified to be criminals using the validation set, thereby leaving the other 148 to be the final criminal suspects and giving a prediction accuracy of 73.9%. The data mining method of criminal suspects based on association rules and tag clustering in this study has been successfully applied to the police system of city S, and the experiment proves the effectiveness of this method in detecting criminal suspects.