Toward an Explainable Large Language Model for the Automatic Identification of the Drug-Induced Liver Injury Literature

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A6TNPX2R8" target="_blank" >RIV/00216208:11320/25:6TNPX2R8 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85202505565&doi=10.1021%2facs.chemrestox.4c00134&partnerID=40&md5=d9d56b0045286139e137e532021afebd" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85202505565&doi=10.1021%2facs.chemrestox.4c00134&partnerID=40&md5=d9d56b0045286139e137e532021afebd</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1021/acs.chemrestox.4c00134" target="_blank" >10.1021/acs.chemrestox.4c00134</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Toward an Explainable Large Language Model for the Automatic Identification of the Drug-Induced Liver Injury Literature
Popis výsledku v původním jazyce
Drug-induced liver injury (DILI) stands as a significant concern in drug safety, representing the primary cause of acute liver failure. Identifying the scientific literature related to DILI is crucial for monitoring, investigating, and conducting meta-analyses of drug safety issues. Given the intricate and often obscure nature of drug interactions, simple keyword searching can be insufficient for the exhaustive retrieval of the DILI-relevant literature. Manual curation of DILI-related publications demands pharmaceutical expertise and is susceptible to errors, severely limiting throughput. Despite numerous efforts utilizing cutting-edge natural language processing and deep learning techniques to automatically identify the DILI-related literature, their performance remains suboptimal for real-world applications in clinical research and regulatory contexts. In the past year, large language models (LLMs) such as ChatGPT and its open-source counterpart LLaMA have achieved groundbreaking progress in natural language understanding and question answering, paving the way for the automated, high-throughput identification of the DILI-related literature and subsequent analysis. Leveraging a large-scale public dataset comprising 14 203 training publications from the CAMDA 2022 literature AI challenge, we have developed what we believe to be the first LLM specialized in DILI analysis based on LLaMA-2. In comparison with other smaller language models such as BERT, GPT, and their variants, LLaMA-2 exhibits an enhanced out-of-fold accuracy of 97.19% and area under the ROC curve of 0.9947 using 3-fold cross-validation on the training set. Despite LLMs’ initial design for dialogue systems, our study illustrates their successful adaptation into accurate classifiers for automated identification of the DILI-related literature from vast collections of documents. This work is a step toward unleashing the potential of LLMs in the context of regulatory science and facilitating the regulatory review process. © 2024 American Chemical Society.
Název v anglickém jazyce
Toward an Explainable Large Language Model for the Automatic Identification of the Drug-Induced Liver Injury Literature
Popis výsledku anglicky
Drug-induced liver injury (DILI) stands as a significant concern in drug safety, representing the primary cause of acute liver failure. Identifying the scientific literature related to DILI is crucial for monitoring, investigating, and conducting meta-analyses of drug safety issues. Given the intricate and often obscure nature of drug interactions, simple keyword searching can be insufficient for the exhaustive retrieval of the DILI-relevant literature. Manual curation of DILI-related publications demands pharmaceutical expertise and is susceptible to errors, severely limiting throughput. Despite numerous efforts utilizing cutting-edge natural language processing and deep learning techniques to automatically identify the DILI-related literature, their performance remains suboptimal for real-world applications in clinical research and regulatory contexts. In the past year, large language models (LLMs) such as ChatGPT and its open-source counterpart LLaMA have achieved groundbreaking progress in natural language understanding and question answering, paving the way for the automated, high-throughput identification of the DILI-related literature and subsequent analysis. Leveraging a large-scale public dataset comprising 14 203 training publications from the CAMDA 2022 literature AI challenge, we have developed what we believe to be the first LLM specialized in DILI analysis based on LLaMA-2. In comparison with other smaller language models such as BERT, GPT, and their variants, LLaMA-2 exhibits an enhanced out-of-fold accuracy of 97.19% and area under the ROC curve of 0.9947 using 3-fold cross-validation on the training set. Despite LLMs’ initial design for dialogue systems, our study illustrates their successful adaptation into accurate classifiers for automated identification of the DILI-related literature from vast collections of documents. This work is a step toward unleashing the potential of LLMs in the context of regulatory science and facilitating the regulatory review process. © 2024 American Chemical Society.

Klasifikace

Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Chemical Research in Toxicology
ISSN
0893-228X
e-ISSN
—
Svazek periodika
37
Číslo periodika v rámci svazku
9
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
11
Strana od-do
1524-1534
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85202505565

Podobné výsledky(10)

Advanced preclinical models for evaluation of drug-induced liver injury – consensus statement by the European drug-induced liver injury network [PRO-EURO-DILI-NET]Applying large language models for automated essay scoring for non-native Japanese Linguistic Rule Induction Improves Adversarial and OOD Robustness in Large Language Models

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Toward an Explainable Large Language Model for the Automatic Identification of the Drug-Induced Liver Injury Literature

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)