The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11130%2F20%3A10415875" target="_blank" >RIV/00216208:11130/20:10415875 - isvavai.cz</a>
Alternative codes found
RIV/00216224:14740/20:00118222 RIV/00159816:_____/20:00073358 RIV/00064203:_____/20:10415875
Result on the web
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=z_7w3c_QQz" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=z_7w3c_QQz</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.media.2020.101714" target="_blank" >10.1016/j.media.2020.101714</a>

Alternative languages

Result language
angličtina
Original language name
The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study
Original language description
Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment.
Czech name
—
Czech description
—

Classification

Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
30224 - Radiology, nuclear medicine and medical imaging

Result continuities

Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Others

Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Name of the periodical
Medical Image Analysis
ISSN
1361-8415
e-ISSN
—
Volume of the periodical
66
Issue of the periodical within the volume
December
Country of publishing house
NL - THE KINGDOM OF THE NETHERLANDS
Number of pages
10
Pages from-to
101714
UT code for WoS article
000579512600001
EID of the result in the Scopus database
2-s2.0-85091795545

Similar results(10)

A deep learning fusion model for accurate classification of brain tumours in Magnetic Resonance images Fully automated imaging protocol independent system for pituitary adenoma segmentation: a convolutional neural network - based model on sparsely annotated MRI How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images

What are you looking for?

Quick search

Smart search

The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)