Large language models for biomolecular analysis: From methods to applications
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AIK9QJBT6" target="_blank" >RIV/00216208:11320/25:IK9QJBT6 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85182873304&doi=10.1016%2fj.trac.2024.117540&partnerID=40&md5=ec9dfa4658b99fa5730c7dbcdf5d3ce8" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85182873304&doi=10.1016%2fj.trac.2024.117540&partnerID=40&md5=ec9dfa4658b99fa5730c7dbcdf5d3ce8</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.trac.2024.117540" target="_blank" >10.1016/j.trac.2024.117540</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Large language models for biomolecular analysis: From methods to applications
Popis výsledku v původním jazyce
Large language models (LLMs) are proving to be very useful in many fields, especially chemistry and biology, because of their amazing capabilities. Biomolecular data is often represented sequentially, much like textual data used to train LLMs. However, developing LLMs from scratch requires a substantial amount of data and computational resources, which may not be feasible for most researchers. A more workable solution to this problem is to change the inputs or parameters so that the previously trained general LLMs can pick up the specific knowledge needed for biomolecular analysis. These adaption strategies lower the amount of data and hardware needed, providing a more affordable option. This review provides the introduction of two popular LLM adaptation techniques: fine-tuning and prompt engineering, along with their uses in the analysis of molecules, proteins, and genes. A thorough overview of current common datasets and pre-trained models is also provided. This review outlines the possible advantages and difficulties of LLMs for biomolecular analysis, opening the door for chemists and biologists to effectively utilize LLMs in their future studies. © 2024 Elsevier B.V.
Název v anglickém jazyce
Large language models for biomolecular analysis: From methods to applications
Popis výsledku anglicky
Large language models (LLMs) are proving to be very useful in many fields, especially chemistry and biology, because of their amazing capabilities. Biomolecular data is often represented sequentially, much like textual data used to train LLMs. However, developing LLMs from scratch requires a substantial amount of data and computational resources, which may not be feasible for most researchers. A more workable solution to this problem is to change the inputs or parameters so that the previously trained general LLMs can pick up the specific knowledge needed for biomolecular analysis. These adaption strategies lower the amount of data and hardware needed, providing a more affordable option. This review provides the introduction of two popular LLM adaptation techniques: fine-tuning and prompt engineering, along with their uses in the analysis of molecules, proteins, and genes. A thorough overview of current common datasets and pre-trained models is also provided. This review outlines the possible advantages and difficulties of LLMs for biomolecular analysis, opening the door for chemists and biologists to effectively utilize LLMs in their future studies. © 2024 Elsevier B.V.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
TrAC - Trends in Analytical Chemistry
ISSN
0165-9936
e-ISSN
—
Svazek periodika
171
Číslo periodika v rámci svazku
2024
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
9
Strana od-do
1-9
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85182873304