An Efficient Clustering Method To Find Similarity
Between The Documents

Kalaivendhan.K; Sumathi.P

Special Issue Article Open Access

An Efficient Clustering Method To Find Similarity Between The Documents

Abstract

Data mining is a concept of extracting or mining knowledge from large amount of data. Clustering is a data mining technique in which it is used to grouping the similar data items. TF-IDF approach is used to calculate the weight of the cluster and ranking method is used to rank the document of the cluster. For clustering the similarity between the pair of objects the similarity measure done by using single view point and multi view point and the values are calculated using cosine similarity. In contrast, the proposed method is based on correlation similarity and uses HAC algorithmfor clustering the documents. By using Correlation similarity, the similarity between the each and every documents of the cluster is calculated. HAC algorithm is used for grouping the cluster level by level and ranking technique is used to give rank to the cluster according to the content of the document and finally the most relevant data is grouped and cluster results are displayed.

Kalaivendhan.K, Sumathi.P

To read the full article Download Full Article