Protein Family Sequence Generation through ProGen2 Fine-Tuning
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F24%3A10490448" target="_blank" >RIV/00216208:11320/24:10490448 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1109/BIBM62325.2024.10822651" target="_blank" >https://doi.org/10.1109/BIBM62325.2024.10822651</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/BIBM62325.2024.10822651" target="_blank" >10.1109/BIBM62325.2024.10822651</a>
Alternative languages
Result language
angličtina
Original language name
Protein Family Sequence Generation through ProGen2 Fine-Tuning
Original language description
Proteins are biomolecules involved in virtually all biological processes, making the design of novel proteins with specific functions crucial for advancing drug development and biological research. Large protein sequence databases allow for training language models adapted from natural language processing, treating amino acid sequences as a biological "language". However, these generative protein language models lack a straightforward, user-friendly method for prompting them to generate specific sequences with desired properties. In this work, we demonstrate how the pre-trained protein language model ProGen2 can be effectively fine-tuned for controllable generation of protein sequences from several distinct protein families. We validate the generated sequences using various in-silico metrics and show that the model is able to generate viable protein sequences that exhibit low similarity to existing proteins.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
ISBN
979-8-3503-8622-6
ISSN
2156-1133
e-ISSN
—
Number of pages
3
Pages from-to
7058-7060
Publisher name
IEEE
Place of publication
USA
Event location
Lisbon, Portugal
Event date
Dec 3, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—