摘要
Web日志数据预处理是Web日志分析的重要步骤,是通过Web日志数据获取有价值信息的基础和前提。本文介绍了对商务部主办的《中俄经贸合作网》大量Web日志数据进行预处理时所采用的算法和处理过程,即先将已知IP地址与物理地址的对应关系保存在HashMap中,然后综合运用折半与顺序查找算法,可显著减少查询次数,提高查询效率。实验结果证明这种方法具有好的应用效果,能极大改善数据预处理软件的性能。
The preprocessing of Web log data is an important step of Web log analysis.It is the foundation and prerequisite of achieving valuable information by analyzing the Web log data.This paper introduced the algorithms and procedures used in the preprocessing of Web log data from Russia-China Economic & Trade Cooperation sponsored by Ministry of Commerce.The corresponding relationship between the IP addresses and the physical addresses is firstly stored into the HashMap.Then the method of combining the bisearch algorithm with sequence-search algorithm is used.This method can reduce the searching amount significantly and improve the searching efficiency.It is proved that the method has a better application effect,and can improve the performance of data preprocessing software greatly.
出处
《河南科技大学学报(自然科学版)》
CAS
北大核心
2009年第5期45-48,共4页
Journal of Henan University of Science And Technology:Natural Science
基金
商务部2009年度"中外经贸合作网"项目(413419)
"211工程"三期重点学科建设项目(73100042)