Feature selection method based on K-S test and neighborhood rough sets
摘要:
传统的肿瘤基因选择算法挑选出的特征基因中存在大量噪声基因和冗余基因,从而对基因算法的准确性和分类精度产生影响.针对这一问题,将K-S检验与邻域粗糙集融合成为一种新的特征选择方法.首先,采用累积分布函数计算正负类样本的累积函数值和K-S检验统计量,对照显著性水平下的样本统计量,从而去除冗余基因和噪声基因;然后,使用邻域粗糙集进行约简,对比条件属性重要度得出最优约简结果;最后,对比K-S检验和两种基于K-S检验的特征选择方法得到的冗余度和分类精度,通过实验验证这种方法不仅能准确挑选出具有显著区分能力的肿瘤基因,且效率高具有可行性.
Traditional tumor gene selection algorithms usually remain many noisy and redundant genes in selected feature values,which affect the gene algorithm accuracy and the classification precision.Aiming at solving this the problem,we propose to combine the K-S test with neighborhood rough sets theory.Firstly,the cumulative distribution function is used to calculate the positive and negative cumulative distribution values and the K-S test statistic,and the sample statistics under the significance level are compared to remove those redundant and noisy genes.Secondly,the reduction is performed through the neighborhood rough sets theory,and the importance of the condition attribute is compared to get the optimal reduction result.Finally,comparing the K-S test and the two feature selection methods based on the K-S test through experiments,this method can not only accurately select the tumor genes with significant ability of distinguishing,but also be efficient and feasible.
作者:
刘艳 程璐 孙林
Liu Yan;Cheng Lu;Sun Lin(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,China)
机构地区:
betway官方app 计算机与信息工程学院
出处:
《betway官方app 学报:自然科学版》 CAS 北大核心 2019年第2期21-28,共8页
基金:
国家自然科学基金(61772176) 中国博士后科学基金项目(2016M602247) 河南省科技创新人才项目(184100510003) 河南省科技攻关项目(182102210362) 河南省高校青年骨干教师培养计划项目(2017GGJS041)
关键词:
K-S检验 邻域粗糙集 特征选择
K-S test neighborhood rough sets feature selection
分类号:
TP181 [自动化与计算机技术—控制理论与控制工程]