Augmenting Historical Alphabet Datasets Using Generative Adversarial Networks
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F60460709%3A41110%2F23%3A92542" target="_blank" >RIV/60460709:41110/23:92542 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-031-21438-7_11" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-031-21438-7_11</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-21438-7_11" target="_blank" >10.1007/978-3-031-21438-7_11</a>
Alternative languages
Result language
angličtina
Original language name
Augmenting Historical Alphabet Datasets Using Generative Adversarial Networks
Original language description
In this paper, we present a method for expanding small classification datasets. Every research project is based on data and methods, including text analysis. When analyzing historical texts in different alphabets, there are not always Optical Character Recognition algorithms available and, in many cases, such texts need to be transliterated and translated manually, or alternatively, an OCR algorithm can be developed. In order to create such an algorithm, a large volume of input data is needed - each alphabet consists of elementary data - either letters, vowels, or in some cases ideograms. The texts need to be segmented into such elements, and then, the elements are classified. In many cases, it is very difficult and time-costly to get a sufficient amount of data, and it is advisable to use augmentation methods. In our research, we propose using Generative Adversarial Network to expand a relatively small dataset of Palmyrene letters and prove that even by adding generated data equal to the third of size of the original dataset, the classification results are improved by 120 percent.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Data Science and Algorithms in Systems
ISBN
978-3-031-21438-7
ISSN
2367-3389
e-ISSN
—
Number of pages
10
Pages from-to
132-141
Publisher name
Springer
Place of publication
Gewerbestrasse 11, 6330 Cham, Switzerland
Event location
online (Praha)
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—