A Fuzzy Biclustering Approach Based on Rough Average Square Residue
摘要:
双聚类作为一种无监督的学习方法,其作用是对基因表达数据进行分析.为了获取较大容量的双聚类簇,弥补传统的双聚类方法在基因表达数据一致波动性方面的不足,引入粗糙集的上、下近似集概念,将粗糙集理论运用到模糊双聚类算法中,将粗糙上、下近似集与加权均方残差相结合,得到新的粗糙均方残基,进而提出一种基于粗糙均方残基的模糊双聚类算法.针对基因表达数据集,首先进行缺失值填补;其次,用非负矩阵分解算法对基因数据集进行降维;最后,计算数据矩阵的粗糙均方残基,结合综合评判度量函数与贴近度原则对矩阵的行列进行删除和添加,得到容量更大的双聚类结果.实验结果表明,该模糊双聚类算法是有效的.
Biclustering as an unsupervised learning method can analyze gene expression data.However,some traditional biclustering methods have the shortcoming of consistent volatility for gene expression data.To solve this problem,and obtain large capacity clusters of biclustering,the upper and lower approximation of rough set was introduced in this paper,and the rough set theory was applied into fuzzy biclustering algorithm.By combining upper and lower approximation with weighted mean square residual,a novel rough mean square residue was defined.Then an improved fuzzy biclustering algorithm based on rough mean square residue was proposed.For gene expression dataset,the missing values were filled up firstly.A factorization algorithm of non-negative matrix was used to reduce dimension of gene dataset.And the rough mean square residue of data matrix was calculated.Finally,through integrating a comprehensive evaluation measure function and nearness degree,the rows and columns of matrixes were deleted or added in order to obtain a larger of biclustering results.Experimental results show that the proposed fuzzy biclustering algorithm is efficient.
作者:
孙林 刘弱南 张霄雨 孙印杰 宋黎明
机构地区:
betway官方app 计算机与信息工程学院 计算智能与数据挖掘河南省高校工程技术研究中心
出处:
《betway官方app 学报:自然科学版》 CAS 北大核心 2017年第5期93-100,共8页
基金:
国家自然科学基金(61402153 61602158) 中国博士后科学基金项目(2016M602247) 河南省高等学校重点科研项目计划(14A520069)
关键词:
粗糙集 粗糙均方残基 双聚类
rough set rough average square residue biclustering
分类号:
TP181 [自动化与计算机技术—控制理论与控制工程]