Semantics-Based Document Categorization Employing Semi-Supervised Learning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F62156489%3A43110%2F17%3A43910951" target="_blank" >RIV/62156489:43110/17:43910951 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.4018/978-1-5225-1759-7.ch077" target="_blank" >http://dx.doi.org/10.4018/978-1-5225-1759-7.ch077</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4018/978-1-5225-1759-7.ch077" target="_blank" >10.4018/978-1-5225-1759-7.ch077</a>
Alternative languages
Result language
angličtina
Original language name
Semantics-Based Document Categorization Employing Semi-Supervised Learning
Original language description
The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as "seeds" to automatically label the remaining volume of unlabeled items.
Czech name
—
Czech description
—
Classification
Type
C - Chapter in a specialist book
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Book/collection name
Artificial Intelligence: Concepts, Methodologies, Tools, and Applications
ISBN
978-1-5225-1759-7
Number of pages of the result
29
Pages from-to
1884-1912
Number of pages of the book
3048
Publisher name
IGI Global
Place of publication
Hershey
UT code for WoS chapter
—