This book puts forward a new method for solving the text document (TD)
clustering problem, which is established in two main stages: (i) A new
feature selection method based on a particle swarm optimization
algorithm with a novel weighting scheme is proposed, as well as a
detailed dimension reduction technique, in order to obtain a new
subset of more informative features with low-dimensional space. This
new subset is subsequently used to improve the performance of the text
clustering (TC) algorithm and reduce its computation time. The k-mean
clustering algorithm is used to evaluate the effectiveness of the
obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, the
(a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d)
multi-objective hybrid KHA, are proposed to solve the TC problem; each
algorithm represents an incremental improvement on its predecessor.
For the evaluation process, seven benchmark text datasets are used
with different characterizations and complexities. Text document (TD)
clustering is a new trend in text mining in which the TDs are
separated into several coherent clusters, where all documents in the
same cluster are similar. The findings presented here confirm that the
proposed methods and algorithms delivered the best results in
comparison with other, similar methods to be found in the literature.
Les mer
Produktdetaljer
ISBN
9783030106744
Publisert
2018
Utgiver
Springer Nature
Språk
Product language
Engelsk
Format
Product format
Digital bok
Forfatter