基于Doc2Vec和LDA模型融合文献质量的学术论文推荐研究
- 分享到:
1:中国矿业大学图书馆
基于Doc2Vec和LDA模型融合文献质量的学术论文推荐研究.pdf
摘要(Abstract):
为解决海量的电子资源给读者带来“信息过载”的困扰,采用基于内容的推荐算法为读者推荐内容适配、质量优良的学术论文.考虑论文文本的上下文语义、词序及全局主题信息,首先采用Doc2Vec和LDA(Latent Dirichlet Allocation)混合语义模型训练候选论文集摘要语料库,学习得到每篇论文的文本向量,其次利用K-Means算法对候选论文集进行聚类,然后探寻目标论文所属簇的类群成员作为待推荐论文,最后融合文献质量权重进行相似度计算并排序,从而得到TOP-N近邻推荐结果.以CNKI图书情报类期刊论文作为语料库,通过实证分析,采用的混合模型与传统的TF-IDF(Term Frequency-Inverse Document Frequency)、Word2Vec、LDA 3种模型相比,推荐结果的精确率较高、排序差异度低,达到良好的推荐效果.
The content-based recommendation algorithm is used to recommend academic papers with adaptive content and high quality for readers, so as to solve the problems of information overload caused by massive electronic resources. The context, word order & global topic information of the thesis text are taken into consideration. Firstly, Doc2Vec and LDA hybrid semantic model are used to train the summary corpus of candidate thesis sets, and the text vector of each thesis is learned. Then, the candidate thesis sets are clustered by K-means algorithm, and then the cluster members of the target papers are searched as the papers to be recommended, Finally, the similarity is calculated and sorted by fusing the literature quality weight, so as to obtain the TOP-N nearest neighbor recommendation results. Taking CNKI library & information journal pap as the corpus, an empirical analysis is conducted. Word2Vec & LDA models, the hybrid model adopted in this paper, compared with the traditional TF-IDF, the hybrid model adopted in this paper has higher accuracy and lower ranking difference, and achieves good recommendation results.
关键词(KeyWords):学术论文;混合语义模型;文献质量;推荐
academic papers;mixed semantic model;literature quality;recommend
基金项目(Foundation):江苏省高校哲学社会科学研究项目(2022SJYB1129);; 国家社科基金(22BTQ023)
作者(Authors):王大阜;邓志文;贾志勇;王静;
Wang Dafu;Deng Zhiwen;Jia Zhiyong;Wang Jing;Library, China University of Mining and Technology;
DOI:10.16366/j.cnki.1000-2367.2023.04.005