基于信噪比与随机森林的肿瘤特征基因选择
摘要:
在肿瘤特征基因选择过程中,传统分类方法会选出大量冗余基因,而大量冗余基因会造成分类精度低和时间复杂度较高等问题,为了解决上述问题,提出一种结合信噪比过滤法与随机森林算法的肿瘤特征基因选择方法.该方法包含两个过程:首先使用信噪比过滤法剔除原始特征空间中的无关和冗余基因,从而获得与分类属性相关性较高的基因,选择出分类能力较强的预选特征子集;其次使用随机森林算法对特征基因子集进行分类,最终获得分类结果.实验结果显示,该算法可以快速有效地选择出肿瘤特征基因,并具有较高的分类精度.
Given in the process of tumor feature gene selection, the traditional classification methods selected a largenumber of redundant genes, which led to a lower classification precision and higher time complexity. In order to solve the a-bove-mentioned problems, this paper proposed a tumor gene feature selection method based on Signal Noise Ration and RandomForest. The method includes two processes: firstly, it filtered the irrelevant genes in the original feature space using the indexof signal noise ratio, and obtained the genes which were closely related to the categorical attributes, then chosen the primarycharacter subsets with higher capability of classification; secondly, classify the obtained character subsets with the random for-est algorithm, finally the classification results were obtained. The experimental results show that the proposed method not onlyquickly and efficiently selected feature gene but also has a higher classification precision.
作者:
徐久成 冯森 穆辉宇
Xu Jiucheng Feng Sen Mu Huiyu(College of Computer &- Information Engineering, Henan Engineering Technology Research Center for Computing Intelligence &- Data Mining, Henan Normal University, Xinxiang 453007, Chin)
机构地区:
betway官方app 计算机与信息工程学院河南省高校计算智能与数据挖掘工程技术研究中心
出处:
《betway官方app 学报:自然科学版》 CAS 北大核心 2017年第2期87-92,共6页
基金:
国家自然科学基金(61370169 61402153) 河南省科技攻关重点项目(142102210056 162102210261)
关键词:
基因表达谱 特征选择 信噪比 随机森林
gene expression profiles feature selection signal-to-noise ratio random forest
分类号:
TP181 [自动化与计算机技术—控制理论与控制工程]