摘要
FSSD(fast and efficient subgroup set discovery)是一种子群发现算法,旨在短时间内提供多样性模式集,然而此算法为了减少运行时间,选择域数量少的特征子集,当特征子集与目标类不相关或者弱相关时,模式集质量下降.针对这个问题,提出一种基于集成特征选择的FSSD算法,它在预处理阶段使用基于ReliefF(Relief-F)和方差分析的集成特征选择来获得多样性和相关性强的特征子集,再使用FSSD算法返回高质量模式集.在UCI数据集、全国健康和营养调查报告(NHANES)数据集上的实验结果表明,改进后的FSSD算法提高了模式集质量,归纳出更有趣的知识.在NHANES数据集上,进一步分析模式集的特征有效性和阳性预测值.
Fast and efficient subgroup set discovery(FSSD)is a subgroup discovery algorithm that aims to provide a diverse set of patterns in a short period of time.However,in order to reduce the running time,this algorithm selects a feature subset with a small number of domains.When the feature subset is irrelevant or weakly related to the target class,the quality of the pattern set decreases.To solve this problem,this study proposes a FSSD algorithm based on ensemble feature selection.In the preprocessing stage,it uses ensemble feature selection based on ReliefF(Relief-F)and analysis of variance to obtain feature subset with diversity and strong correlation,and then uses FSSD algorithm to return highquality pattern set.The experimental results on the UCI datasets and the National Health and Nutrition Examination Survey(NHANES)dataset show that the improved FSSD algorithm improves the quality of the pattern set,thereby summarizing more interesting knowledge.Furthermore,the feature validity and positive predictive value of the pattern set are further analyzed on the NHANES dataset.
作者
张崟
何振峰
ZHANG Yin;HE Zhen-Feng(College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350108,China)
出处
《计算机系统应用》
2022年第3期275-281,共7页
Computer Systems & Applications
基金
福建省自然科学基金(2018J01794)。