Unsupervised extraction, labelling and clustering of segments from clinical notes
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F22%3A00127605" target="_blank" >RIV/00216224:14330/22:00127605 - isvavai.cz</a>
Result on the web
<a href="https://arxiv.org/abs/2211.11799" target="_blank" >https://arxiv.org/abs/2211.11799</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/BIBM55620.2022.9995229" target="_blank" >10.1109/BIBM55620.2022.9995229</a>
Alternative languages
Result language
angličtina
Original language name
Unsupervised extraction, labelling and clustering of segments from clinical notes
Original language description
This work is motivated by the scarcity of tools for accurate, unsupervised information extraction from unstructured clinical notes in computationally underrepresented languages, such as Czech. We introduce a stepping stone to a broad array of downstream tasks such as summarisation or integration of individual patient records, extraction of structured information for national cancer registry reporting or building of semi-structured semantic patient representations for computing patient embeddings. More specifically, we present a method for unsupervised extraction of semantically-labelled textual segments from clinical notes and test it out on a dataset of Czech breast cancer patients, provided by Masaryk Memorial Cancer Institute (the largest Czech hospital specialising in oncology). Our goal was to extract, classify (i.e. label) and cluster segments of the free-text notes that correspond to specific clinical features (e.g., family background, comorbidities or toxicities). The presented results demonstrate the practical relevance of the proposed approach for building more sophisticated extraction and analytical pipelines deployed on Czech clinical notes.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
ISBN
9781665468206
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
1362-1368
Publisher name
IEEE
Place of publication
USA
Event location
USA
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—