Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins
Contributo in Atti di convegno
Data di Pubblicazione:
2024
Citazione:
Fine-tuning of Conditional Transformers Improves the Generalization of Functionally Characterized Proteins / M. Nicolini, D. Malchiodi, A. Cabri, E. Cavalleri, M. Mesiti, A. Paccanaro, N. Robinson Peter, J. Reese, E. Casiraghi, G. Valentini - In: Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS / [a cura di] M. P. Guarino; K. Hotta; M. Yousef; H. Liu; G. Saggio; A. Fred; H. Gamboa. - [s.l] : SCITEPress, 2024. - ISBN 978-989-758-688-0. - pp. 561-568 (( Intervento presentato al 17. convegno International Joint Conference on Biomedical Engineering Systems and Technologies tenutosi a Roma nel 2024 [10.5220/0012567900003657].
Abstract:
Conditional transformers improve the generative capabilities of large language models (LLMs) by processing specific control tags able to drive the generation of texts characterized by specific features. Recently, a similar approach has been applied to the generation of functionally characterized proteins by adding specific tags to the protein sequence to qualify their functions (e.g., Gene Ontology terms) or other characteristics (e.g., their family or the species which they belong to). In this work, we show that fine tuning conditional transformers, pre-trained on large corpora of proteins, on specific protein families can significantly enhance the prediction accuracy of the pre-trained models and can also generate new potentially functional proteins that could enlarge the protein space explored by the natural evolution. We obtained encouraging results on the phage lysozyme family of proteins, achieving statistically significant better prediction results than the original pre-traine d model. The comparative analysis of the primary and tertiary structure of the synthetic proteins generated by our model with the natural ones shows that the resulting fine-tuned model is able to generate biologically plausible proteins. Our results confirm and suggest that fine-tuned conditional transformers can be applied to other functionally characterized proteins for possible industrial and pharmacological applications.
Tipologia IRIS:
03 - Contributo in volume
Keywords:
Large Language Models; Protein Language Models; Conditional Transformers; Protein design and modeling
Elenco autori:
M. Nicolini, D. Malchiodi, A. Cabri, E. Cavalleri, M. Mesiti, A. Paccanaro, N. Robinson Peter, J. Reese, E. Casiraghi, G. Valentini
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 1: BIOINFORMATICS