摘要
从单影响点到多影响点2个角度回顾了影响点诊断领域的进展;重点介绍了近年发展起来的一些高维影响点检测新方法,该方法适用于自变量个数远超样本量的情形,可被看作是经典Cook距离在高维数据的推广.Cook距离量化了个体观测对最小二乘系数估计的影响,而新方法则捕获了个体观测对边际相关的影响,进而对变量选择和其他下游分析任务产生重要影响.数值模拟结果验证了新方法的可行性和有效性.
Advances in the field of influential point diagnosis are reviewed from both single and multiple influential point perspectives.Several new methods for high-dimensional influential point diagnosis developed in recent years are highlighted.The method is applicable to cases where the number of independent variables far exceeds sample size,and can be regarded as generalization of the classical Cook distance to high-dimensional data.The Cook distance measures effect of observations on least square coefficient estimates.In contrast,the new methods capture the effect of observations on marginal correlation,with important implications for variable selection and other downstream tasks.Numerical simulation results demonstrate effectiveness of these new methods.
作者
张欣
赵俊龙
ZHANG Xin;ZHAO Junlong(School of Statistics,Beijing Normal University,100875,Beijing,China)
出处
《北京师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2023年第2期313-318,共6页
Journal of Beijing Normal University(Natural Science)
基金
国家自然科学基金资助项目(11871104,12131006)。
关键词
影响点诊断
高维数据
线性模型
COOK距离
边际相关
influential point diagnosis
high-dimensional data
linear models
Cook distance
marginal correlation