[1]林滨.K-Means聚类的多种距离计算方法的文本实验比较[J].福建工程学院学报,2016,14(01):80-85.[doi:10.3969/j.issn.1672-4348.2016.01.018]
 Lin Bin.Experimental comparison of K-Means text clustering by varied distance calculation methods[J].Journal of FuJian University of Technology,2016,14(01):80-85.[doi:10.3969/j.issn.1672-4348.2016.01.018]
点击复制

K-Means聚类的多种距离计算方法的文本实验比较()
分享到:

《福建工程学院学报》[ISSN:2097-3853/CN:35-1351/Z]

卷:
第14卷
期数:
2016年01期
页码:
80-85
栏目:
出版日期:
2016-02-25

文章信息/Info

Title:
Experimental comparison of K-Means text clustering by varied distance calculation methods
作者:
林滨
福州软件职业技术学院计算机系
Author(s):
Lin Bin
Fuzhou Software Technology Vocational College
关键词:
文本聚类 TF-IDF K-Means 距离计算
Keywords:
text clustering TF-IDF KMeans distance calculation
分类号:
TP311.13
DOI:
10.3969/j.issn.1672-4348.2016.01.018
文献标志码:
A
摘要:
针对文本类型数据的分类进行研究,用VSM模型和TFIDF技术对文本文件进行了数据样本抽取加权,得到文本相似度矩阵;采用不同样本距离计算方法和K-Means算法对数据进行了聚类实验,获得聚类结果并进行了分析和总结;基于实验结论,研究了不同距离计算方法之间的区别以及适用的数据类型。
Abstract:
Text data samples were extracted and weighted and the text similarity matrices were obtained by vector space model (VSM) model and TF-IDF weighting technology. The data clustering was conducted via different distance calculation methods and K-Means algorithm.The clustering results were analysed. The differences among the distance calculation methods and the applicable data types were studied.

参考文献/References:

[1] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[2] 余正涛,樊孝忠,郭剑毅,等.基于潜在语义分析的汉语问答系统答案提取[J].计算机学报,2006,29(10):1889-1893.
[3] 吴飞,韩亚洪,庄越挺,等.图像-文本相关性挖掘的Web图像聚类方法[J].软件学报,2010,21(7):1561-1575.
[4] 吴夙慧,成颖,郑彦宁. Kmeans算法研究综述[J].现代图书情报技术,2011(5):28-35.
[5] 翟东海,鱼江,高飞,等. 最大距离法选取初始簇中心的Kmeans 文本聚类算法的研究[J]. 计算机应用研究,2014,31(3):713- 719.
[6] Jain A K. Data clustering: 50 years beyond k-Means[J]. Pattern Recognition Letters,2010,31(8):651-666.
[7] Song Q B, Ni J J, Wang G T. A fast clustering-based feature subset selection algorithm for highdimensional data[J]. IEEE Trans on Knowledge and Data Engineering,2013,25(1):1-14.
[8] Aldahdooh R T, Ashour W. Distance-based initialization method for Kmeans clustering algorithm[J]. International Journal of Intelligent Systems and Applications,2013,5(2):41-51.
[9] 李法运,农罗锋.基于向量语义相似度的改进K-Means算法[J].情报科学,2013,31(2),34-37.
[10] 吴夙慧,成颖,郑彦宁,等.K-means算法研究综述[J].现代图书情报技术, 2011,34(5):28-37.

相似文献/References:

[1]王金水,唐郑熠,薛醒思.基于词性标注的文本聚类算法[J].福建工程学院学报,2015,13(04):372.[doi:10.3969/j.issn.1672-4348.2015.04.014]
 Wang Jinshui,Tang Zhengyi,Xue Xingsi.A text clustering algorithm based on partofspeech tagging[J].Journal of FuJian University of Technology,2015,13(01):372.[doi:10.3969/j.issn.1672-4348.2015.04.014]

更新日期/Last Update: 2016-02-25