[1]王金水,唐郑熠,薛醒思.基于词性标注的文本聚类算法[J].福建理工大学学报,2015,13(04):372-375.[doi:10.3969/j.issn.1672-4348.2015.04.014]
 Wang Jinshui,Tang Zhengyi,Xue Xingsi.A text clustering algorithm based on partofspeech tagging[J].Journal of Fujian University of Technology;,2015,13(04):372-375.[doi:10.3969/j.issn.1672-4348.2015.04.014]
点击复制

基于词性标注的文本聚类算法()
分享到:

《福建理工大学学报》[ISSN:2097-3853/CN:35-1351/Z]

卷:
第13卷
期数:
2015年04期
页码:
372-375
栏目:
出版日期:
2015-08-25

文章信息/Info

Title:
A text clustering algorithm based on partofspeech tagging
作者:
王金水唐郑熠薛醒思
福建工程学院信息科学与工程学院
Author(s):
Wang Jinshui Tang Zhengyi Xue Xingsi
College of Information Science and Engineering, Fujian University of Technology
关键词:
文本聚类 词性标注 自然语言处理 聚类分析
Keywords:
text clustering partofspeech tagging natural language process cluster analysis
分类号:
TP393.08
DOI:
10.3969/j.issn.1672-4348.2015.04.014
文献标志码:
A
摘要:
针对传统的文本聚类容易受到噪声影响的问题,提出一个基于词性标注的文本聚类算法。该算法利用词性标注从文本中识别并抽取最能体现文本特征的关键词,再基于所抽取的关键词进行聚类操作。实验发现,相对传统的聚类算法,基于词性标注的文本聚类算法能够有效地提高聚类结果的质量。
Abstract:
To tackle the problem that traditional text clustering methods are susceptible to the effects of noises, a text clustering algorithm based on partofspeech tagging was proposed. Firstly, the partofspeech tagging was utilized to recognize the keywords that well characterize the text features. A text clustering based on the recognized keywords was performed via the proposed algorithm. The experimental results show that comparing with the clustering results generated by the traditional clustering algorithm, our proposal was able to effectively improve the quality of clustering results.

参考文献/References:

[1] 杨震.文本分类和聚类中若干问题的研究[D].北京:北京邮电大学,2007.
[2] 王春龙.文本聚类关键技术研究[D].北京:华北电力大学,2014.
[3] 叶宇飞,安世全,代劲.一种新的Web中文文本聚类方法研究[J].计算机应用与软件,2013(12):222-225.
[4] 姚清耘,刘功申,李翔.基于向量空间模型的文本聚类算法[J].计算机工程,2008,34(18):39-41.
[5] Aggarwal C C, Yu P S. Finding generalized projected clusters in high dimensional spaces[J]. Sigmod,2002,29(2):70-81.
[6] Dash M, Koot P W. Feature Selection for Clustering[M]. Berlin:Springer,2000:110-121.
[7] 刘丹青.汉语是一种动词型语言——试说动词型语言和名词型语言的类型差异[J].世界汉语教学,2010(1):3-17.
[8] 韩普,王东波,刘艳云,等.词性对中英文文本聚类的影响研究[J].中文信息学报,2013,27(2):65-73.
[9] 郭永辉,吴保民,王炳锡.一种用于词性标注的相关投票融合策略[J].中文信息学报,2007,21(2):9-13.
[10] 苏冲.基于最大频繁项集的搜索引擎查询结果聚类方法[D].哈尔滨:哈尔滨工业大学,2009.

相似文献/References:

[1]林滨.K-Means聚类的多种距离计算方法的文本实验比较[J].福建理工大学学报,2016,14(01):80.[doi:10.3969/j.issn.1672-4348.2016.01.018]
 Lin Bin.Experimental comparison of K-Means text clustering by varied distance calculation methods[J].Journal of Fujian University of Technology;,2016,14(04):80.[doi:10.3969/j.issn.1672-4348.2016.01.018]

更新日期/Last Update: 2015-08-25