This book puts forward a new method for solving the text document (TD) clustering problem, which is established in two main stages: (i) A new feature selection method based on a particle swarm optimization algorithm with a novel weighting scheme is proposed, as well as a detailed dimension reduction technique, in order to obtain a new subset of more informative features with low-dimensional space. This new subset is subsequently used to improve the performance of the text clustering (TC) algorithm and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the obtained subsets. (ii) Four krill herd algorithms (KHAs), namely, the (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; each algorithm represents an incremental improvement on its predecessor. For the evaluation process, seven benchmark text datasets are used with different characterizations and complexities.

Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where all documents in the same cluster are similar. The findings presented here confirm that the proposed methods and algorithms delivered the best results in comparison with other, similar methods to be found in the literature.

Les mer
Presents a new method for solving the text document clustering problem and demonstrates that it can outperform other comparable methods Covers the main text clustering preprocessing steps and the metaheuristics needed in order to deal with the text document clustering problems Proposes methods that can be applied to a broad range of text documents (e.g. newsgroup documents appearing on newswires, Internet web pages, and hospital information), modern applications (technical reports and university data), and the biomedical sciences (large biomedical datasets)
Les mer
GPSR Compliance The European Union's (EU) General Product Safety Regulation (GPSR) is a set of rules that requires consumer products to be safe and our obligations to ensure this. If you have any concerns about our products you can contact us on ProductSafety@springernature.com. In case Publisher is established outside the EU, the EU authorized representative is: Springer Nature Customer Service Center GmbH Europaplatz 3 69115 Heidelberg, Germany ProductSafety@springernature.com
Les mer

Produktdetaljer

ISBN
9783030106737
Publisert
2019-01-03
Utgiver
Springer Nature Switzerland AG
Høyde
235 mm
Bredde
155 mm
Aldersnivå
Research, P, 06
Språk
Product language
Engelsk
Format
Product format
Innbundet
Antall sider
27