Identifying Hidden Patterns from Health Administrative Claims by Means of “HAC2Vec” Embedding
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21460%2F24%3A00377149" target="_blank" >RIV/68407700:21460/24:00377149 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-031-62520-6_6" target="_blank" >http://dx.doi.org/10.1007/978-3-031-62520-6_6</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-62520-6_6" target="_blank" >10.1007/978-3-031-62520-6_6</a>
Alternative languages
Result language
angličtina
Original language name
Identifying Hidden Patterns from Health Administrative Claims by Means of “HAC2Vec” Embedding
Original language description
The field of artificial intelligence (AI) has recently seen a significant role for Generative AI, particularly large language models (LLMs), and Natural Language Processing (NLP) techniques in healthcare applications. This paper explores the utility of language technologies, in deepening the understanding of Health Administrative Claims (HAC) data, a critical healthcare data source containing codes related to healthcare services. HAC data often lack essential clinical details, making it challenging to analyze disease phases, forms and subtypes. However, distinctive patterns of codes within HAC data can potentially signify specific disease phenotypes, making language technologies valuable tools for analysis. To address this, we introduce the “HAC2vec-mean” method, which utilizes skip-gram neural networks to convert HAC sequences into numerical vectors. We employ random forest models for binary and multiclass classification tasks, achieving an Area under the Receiver Operating Characteristic Curve of 0.86 for International Classification of Diseases v10. The paper presents data visualizations indicating the effectiveness of the approach in reducing data dimensionality and identifying patterns in patient profiles. Furthermore, it highlights the potential of this approach for cohort selection and index date specification. In conclusion, our study demonstrates the potential of NLP embeddings in enhancing the analysis of HAC data. This flexible framework offers improved insights into patient journeys and healthcare conditions, mitigating the limitations associated with traditional methods. Future work includes exploring the clinical relevance of identified patterns and enhancing explainability. Overall, this research opens doors to uncovering hidden structures with prognostic and therapeutic potential within HAC data.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
30304 - Public and environmental health
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Advances in Digital Health and Medical Bioengineering, Proceedings of the 11th International Conference on E-Health and Bioengineering, EHB-2023, November 9–10, 2023, Bucharest, Romania – Volume 2: Health Technology Assessment, Biomedical Signal Processing, Medicine and Informatics
ISBN
978-3-031-62519-0
ISSN
1680-0737
e-ISSN
1433-9277
Number of pages
8
Pages from-to
45-52
Publisher name
Springer Nature Switzerland AG
Place of publication
Basel
Event location
Bucuresti
Event date
Nov 9, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
001326809000006