Towards Personal Data Anonymization for Social Messaging
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F21%3A00119196" target="_blank" >RIV/00216224:14330/21:00119196 - isvavai.cz</a>
Result on the web
<a href="https://link.springer.com/chapter/10.1007/978-3-030-83527-9_24" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-030-83527-9_24</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-83527-9_24" target="_blank" >10.1007/978-3-030-83527-9_24</a>
Alternative languages
Result language
angličtina
Original language name
Towards Personal Data Anonymization for Social Messaging
Original language description
We present a method for building text corpora for the supervised learning of text-to-text anonymization while maintaining a strict privacy policy. In our solution, personal data entities are detected, classified, and anonymized. We use available machine-learning methods, like named-entity recognition, and improve their performance by grouping multiple entities into larger units based on the theory of tabular data anonymization. Experimental results on annotated Czech Facebook Messenger conversations reveal that our solution has recall comparable to human annotators. On the other hand, precision is much lower because of the low efficiency of the named entity recognition in the domain of social messaging conversations. The resulting anonymized text is of high utility because of the replacement methods that produce natural text.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GX19-27828X" target="_blank" >GX19-27828X: Modelling the future: Understanding the impact of technology on adolescent’s well-being</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Text, Speech, and Dialogue
ISBN
9783030835262
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
12
Pages from-to
281-292
Publisher name
Springer, Cham
Place of publication
Cham
Event location
Olomouc
Event date
Sep 6, 2021
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—