期刊文献+
共找到34篇文章
< 1 2 >
每页显示 20 50 100
Web Mining Model Based on Rough Set Theory
1
作者 吴冰 赵林度 《Journal of Southeast University(English Edition)》 EI CAS 2002年第1期54-58,共5页
Due to a great deal of valuable information contained in the Web log file, the result of Web mining can be used to enhance the decision making for electronic commerce (EC) operation and management. Because of ambiguo... Due to a great deal of valuable information contained in the Web log file, the result of Web mining can be used to enhance the decision making for electronic commerce (EC) operation and management. Because of ambiguous and abundance of the Web log file, the least decision making model based on rough set theory was presented for Web mining. And an example was given to explain the model. The model can predigest the decision making table, so that the least solution of the table can be acquired. According to the least solution, the corresponding decision for individual service can be made in sequence. Web mining based on rough set theory is also currently the original and particular method. 展开更多
关键词 web mining rough sets electronic commerce knowledge reasoning web log
下载PDF
Applied Approaches of Rough Set Theory to Web Mining 被引量:1
2
作者 孙铁利 教巍巍 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期117-120,共4页
Rough set theory is a new soft computing tool, and has received much attention of researchers around the world. It can deal with incomplete and uncertain information. Now, it has been applied in many areas successfull... Rough set theory is a new soft computing tool, and has received much attention of researchers around the world. It can deal with incomplete and uncertain information. Now, it has been applied in many areas successfully. This paper introduces the basic concepts of rough set and discusses its applications in Web mining. In particular, some applications of rough set theory to intelligent information processing are emphasized. 展开更多
关键词 rough set web mining knowledge discovery uncertainty.
下载PDF
Parallel Web Mining System Based on Cloud Platform 被引量:1
3
作者 Shengmei Luo Qing He +3 位作者 Lixia Liu Xiang Ao Ning Li Fuzhen Zhuang 《ZTE Communications》 2012年第4期45-53,共9页
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithm... Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data. 展开更多
关键词 web mining large scale high volume high dimension cloudcomputing
下载PDF
The design and implementation of web mining in web sites security 被引量:2
4
作者 LI Jian, ZHANG Guo-yin , GU Guo-chang, LI Jian-li College of Computer Science and Technology, Harbin Engineering University, Harbin 150001China 《Journal of Marine Science and Application》 2003年第1期81-86,共6页
The backdoor or information leak of Web servers can be detected by using Web Mining techniques on some abnormal Web log and Web application log data. The security of Web servers can be enhanced and the damage of illeg... The backdoor or information leak of Web servers can be detected by using Web Mining techniques on some abnormal Web log and Web application log data. The security of Web servers can be enhanced and the damage of illegal access can be avoided. Firstly, the system for discovering the patterns of information leakages in CGI scripts from Web log data was proposed. Secondly, those patterns for system administrators to modify their codes and enhance their Web site security were provided. The following aspects were described: one is to combine web application log with web log to extract more information,so web data mining could be used to mine web log for discovering the information that firewall and Information Detection System cannot find. Another approach is to propose an operation module of web site to enhance Web site security. In cluster server session, Density -Based Clustering technique is used to reduce resource cost and obtain better efficiency. 展开更多
关键词 data mining web log mining web sites security density-based clustering
下载PDF
Mining Interesting Knowledge from Web-Log 被引量:1
5
作者 ZHOUHong-fang FENGBo-qin +1 位作者 HEIXin-hong LULin-tao 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期569-574,共6页
Web-log contains a lot of information related with user activities on the Internet. How to mine user browsing interest patterns effectively is an important and challengeable research topic. On the analysis of the pres... Web-log contains a lot of information related with user activities on the Internet. How to mine user browsing interest patterns effectively is an important and challengeable research topic. On the analysis of the present algorithm’s advantages and disadvantages we propose a new concept: support-interest. Its key insight is that visitor will backtrack if they do not find the information where they expect. And the point from where they backtrack is the expected location for the page. We present User Access Matrix and the corresponding algorithm for discovering such expected locations that can handle page caching by the browser. Since the URL-URL matrix is a sparse matrix which can be represented by List of 3-tuples, we can mine user preferred sub-paths from the computation of this matrix. Accordingly, all the sub-paths are merged, and user preferred paths are formed. Experiments showed that it was accurate and scalable. It’s suitable for website based application, such as to optimize website’s topological structure or to design personalized services. Key words Web Mining - user preferred path - Web-log - support-interest - personalized services CLC number TP 391 Foundation item: Supported by the National High Technology Development (863 program of China) (2001AA113182)Biography: ZHOU Hong-fang (1976-), female.Ph. D candidate, research direction: data mining and knowledge discovery in databases. 展开更多
关键词 web mining user preferred path web-log support-interest personalized services
下载PDF
Optimization of Web Search Engine and Its Application to Web Mining 被引量:1
6
作者 CHEN Hao ZOU Beiji BIAN Naizheng 《Wuhan University Journal of Natural Sciences》 CAS 2009年第2期115-118,共4页
With the explosive growth of information sources available on the World Wide Web, how to combine the results of multiple search engines has become a valuable problem. In this paper, a search strategy based on genetic ... With the explosive growth of information sources available on the World Wide Web, how to combine the results of multiple search engines has become a valuable problem. In this paper, a search strategy based on genetic simulated annealing for search engines in Web mining is proposed. According to the proposed strategy, there exists some important relationship among Web statistical studies, search engines and optimization techniques. We have proven experimentally the relevance of our approach to the presented queries by comparing the qualities of output pages with those of the original downloaded pages, as the number of iterations increases better results are obtained with reasonable execution time. 展开更多
关键词 web mining genetic algorithm simulated annealing
原文传递
Automatic Clustering of User Behaviour Profiles for Web Recommendation System
7
作者 S.Sadesh Osamah Ibrahim Khalaf +3 位作者 Mohammad Shorfuzzaman Abdulmajeed Alsufyani K.Sangeetha Mueen Uddin 《Intelligent Automation & Soft Computing》 SCIE 2023年第3期3365-3384,共20页
Web usage mining,content mining,and structure mining comprise the web mining process.Web-Page Recommendation(WPR)development by incor-porating Data Mining Techniques(DMT)did not include end-users with improved perform... Web usage mining,content mining,and structure mining comprise the web mining process.Web-Page Recommendation(WPR)development by incor-porating Data Mining Techniques(DMT)did not include end-users with improved performance in the obtainedfiltering results.The cluster user profile-based clustering process is delayed when it has a low precision rate.Markov Chain Monte Carlo-Dynamic Clustering(MC2-DC)is based on the User Behavior Profile(UBP)model group’s similar user behavior on a dynamic update of UBP.The Reversible-Jump Concept(RJC)reviews the history with updated UBP and moves to appropriate clusters.Hamilton’s Filtering Framework(HFF)is designed tofilter user data based on personalised information on automatically updated UBP through the Search Engine(SE).The Hamilton Filtered Regime Switching User Query Probability(HFRSUQP)works forward the updated UBP for easy and accuratefiltering of users’interests and improves WPR.A Probabilistic User Result Feature Ranking based on Gaussian Distribution(PURFR-GD)has been developed to user rank results in a web mining process.PURFR-GD decreases the delay time in the end-to-end workflow for SE personalization in various meth-ods by using the Gaussian Distribution Function(GDF).The theoretical analysis and experiment results of the proposed MC2-DC method automatically increase the updated UBP accuracy by 18.78%.HFRSUQP enabled extensive Maximize Log-Likelihood(ML-L)increases to 15.28%of User Personalized Information Search Retrieval Rate(UPISRT).For feature ranking,the PURFR-GD model defines higher Classification Accuracy(CA)and Precision Ratio(PR)while uti-lising minimum Execution Time(ET).Furthermore,UPISRT's ranking perfor-mance has improved by 20%. 展开更多
关键词 Data mining web mining process search engine web-page recommendation ACCURACY
下载PDF
A Novel Incremental Mining Algorithm of Frequent Patterns for Web Usage Mining 被引量:1
8
作者 DONG Yihong ZHUANG Yueting TAI Xiaoying 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期777-782,共6页
Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a... Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision. 展开更多
关键词 incremental algorithm association rule frequent pattern tree web usage mining
下载PDF
Semantic Session Analysis for Web Usage Mining 被引量:1
9
作者 ZHANG Hui SONG Hantao XU Xiaomei 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期773-776,共4页
A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is... A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is used to identifying which active session a request should belong to. The competitive method is applied to determine the end of the sessions. Compared with other algorithms, more successful sessions are additionally detected by semantic outlier analysis. 展开更多
关键词 web usage mining web log preparation session analysis
下载PDF
Evaluation Method of Web Site Structure Based on Web Structure Mining 被引量:1
10
作者 Li Jun\|e 1 , Zhou Dong\|ru 2 1. Computer Center,Wuhan University, Wuhan 430072, Hubei, China 2. School of Computer,Wuhan University, Wuhan 430072, Hubei,China 《Wuhan University Journal of Natural Sciences》 CAS 2003年第03A期791-796,共6页
The structure of Web site became more complex than before. During the design period of a Web site, the lack of model and method results in improper Web structure, which depend on the designer's experience. From th... The structure of Web site became more complex than before. During the design period of a Web site, the lack of model and method results in improper Web structure, which depend on the designer's experience. From the point of view of software engineering, every period in the software life must be evaluated before starting the next period's work. It is very important and essential to search relevant methods for evaluating Web structure before the site is completed. In this work, after studying the related work about the Web structure mining and analyzing the major structure mining methods (Page\|rank and Hub/Authority), a method based on the Page\|rank for Web structure evaluation in design stage is proposed. A Web structure modeling language WSML is designed, and the implement strategies for evaluating system of the Web site structure are given out. Web structure mining has being used mainly in search engines before. It is the first time to employ the Web structure mining technology to evaluate a Web structure in the design period of a Web site. It contributes to the formalization of the design documents for Web site and the improving of software engineering for large scale Web site, and the evaluating system is a practical tool for Web site construction. 展开更多
关键词 web structure mining page\|rank web structure evaluation modeling language for web structure
下载PDF
Incremental Web Usage Mining Based on Active Ant Colony Clustering
11
作者 SHEN Jie LIN Ying CHEN Zhimin 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1081-1085,共5页
To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant... To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining. 展开更多
关键词 web usage mining ant colony clustering incremental mining
下载PDF
The Study on Network Education based on Web Data Mining
12
作者 Chen Jing 《International English Education Research》 2014年第7期83-85,共3页
Since the emergency of the mining of web usage patterns in the nineties of the 20th century, it has gotten a great development because of its wide range of application. To take advantage of the mining of web usage pat... Since the emergency of the mining of web usage patterns in the nineties of the 20th century, it has gotten a great development because of its wide range of application. To take advantage of the mining of web usage patterns, it will make network education system to meet personalized requirement better by distinguishing user interest and finding out important page. 展开更多
关键词 web Usage mining network education personalized requirement
下载PDF
Matrix dimensionality reduction for mining typical user profiles 被引量:2
13
作者 陆建江 徐宝文 +1 位作者 黄刚石 张亚非 《Journal of Southeast University(English Edition)》 EI CAS 2003年第3期231-235,共5页
Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usual... Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions. 展开更多
关键词 web usage mining non-negative matrix factorization spherical k-means algorithm
下载PDF
A Chinese Web Page Clustering Algorithm Based on the Suffix Tree 被引量:4
14
作者 YANGJian-wu 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期817-822,共6页
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p... In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining. 展开更多
关键词 CLUSTERING suffix tree web mining
下载PDF
A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications 被引量:2
15
作者 ZHAOCheng-li YIDong-yun 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期611-616,共6页
A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the n... A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the noisy blocks. The noises in Web pages can seriously harm Web data mining. To the question of climinating these noises, we intro duce a new tree structure, called Style Tree, and study an algorithm how to construct a site style tree. The Style Tree Model is employed to detect and climinate noises in any Web pages of the site. An information based measure to determine which element node is noisy is also constructed. In addition, the applications of this method are discussed in detail. Experimental results show that our noises climination technique is able to improve the mining results significantly. Key words noises climination - DOM tree - style tree - Web mining CLC number TP 339 Foundation item: Supported by the National Natural Science Foundation of China (60003013)Biography: ZHAN Cheng-li (1979-), male, Master candidate, research direction: Intelligent Information System. 展开更多
关键词 noises climination DOM tree style tree web mining
下载PDF
Using MWD: A Business Intelligence System for Tourism Destination Web 被引量:2
16
作者 Aurkene Alzua-Sorzabal Jon Kepa Gerrikagoitia Fidel Rebón 《Management Studies》 2014年第1期62-72,共11页
The importance of Internet as mass media in the field of tourism is that it constitutes an important channel of marketing institutions and business network of the tourist destinations. But very few subsequent processe... The importance of Internet as mass media in the field of tourism is that it constitutes an important channel of marketing institutions and business network of the tourist destinations. But very few subsequent processes of management, maintenance, improvement, and exploitation of this appearance are deeply studied. The interactive nature of the website, as both transmitter of information and receiver, has attracted the attention of scholars since the interaction allows opening new approaches to the study of the network traffic (the pages user has visited, order them, the time that it has been in them, the actions carried out...) and cyber behavior. Information flows from the physical to the cyber world, and vice versa, adapting the converged world to human behavior and social dynamic. The business intelligence systems based on Internet enable organizations intelligent actions to address time-sensitive business processes and benefit from analytics. As result provides the opportunity to anticipate and estimate visitor habits in a changing environment. This paper presents the research and technological fields which have been incorporated to study of the destination web, a business intelligent tool based on Internet that it aims to increase the performance of the local manager or tour operator by providing an enhanced insight through the behavior of visitors on the website and future trends in research are expressed. 展开更多
关键词 tourism destination web monitor web mining web analytics business intelligence system
下载PDF
Web Fuzzy Clustering and a Case Study
17
作者 LIUMao-fu HEJing +1 位作者 HEYan-xiang HUHui-jun 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第4期411-414,共4页
We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can ... We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining. 展开更多
关键词 web mining web usage mining web fuzzy clustering WFCM
下载PDF
A Survey of Web Information System and Applications
18
作者 HAN Yanbo LI Juanzi +3 位作者 YANG Nan LIU Qing XU Baowen MENG Xiaofeng 《Wuhan University Journal of Natural Sciences》 CAS 2007年第5期769-772,共4页
The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research area... The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication. 展开更多
关键词 web mining data warehouse Deep web web integration web services P2P computing text processing information retrieval web security
下载PDF
An Efficient Mechanism for Product Data Extraction from E-Commerce Websites
19
作者 Malik Javed Akhtar Zahur Ahmad +3 位作者 Rashid Amin Sultan H.Almotiri Mohammed A.Al Ghamdi Hamza Aldabbas 《Computers, Materials & Continua》 SCIE EI 2020年第12期2639-2663,共25页
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst... A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results. 展开更多
关键词 Document object model rich data region common tag sequence web data extraction deep web mining
下载PDF
Web multimedia information retrieval using improved Bayesian algorithm 被引量:3
20
作者 余铁军 陈纯 +1 位作者 余铁民 林怀忠 《Journal of Zhejiang University Science》 EI CSCD 2003年第4期415-420,共6页
The main thrust of this paper is application of a novel data mining approach on the log of user' s feedback to improve web multimedia information retrieval performance. A user space model was constructed based... The main thrust of this paper is application of a novel data mining approach on the log of user' s feedback to improve web multimedia information retrieval performance. A user space model was constructed based on data mining, and then integrated into the original information space model to improve the accuracy of the new information space model. It can remove clutter and irrelevant text information and help to eliminate mismatch between the page author' s expression and the user' s understanding and expectation. User spacemodel was also utilized to discover the relationship between high-level and low-level features for assigning weight. The authors proposed improved Bayesian algorithm for data mining. Experiment proved that the au-thors' proposed algorithm was efficient. 展开更多
关键词 Relevant feedback web log mining Improved Bayesian algorithm User space model
下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部