Title: AI foundation models and prompt-based learning for biomedical research
Abstract: AI foundation models, trained on large-scale data, offer unprecedented opportunities for many fields, including bioinformatics. The potentials of these models are further magnified when combined with prompt-based learning, achieving state-of-the-art results even with minimal labeled data. This presentation will show some of our studies in this area. Specifically, we developed several prompt refinement techniques to enhance the prediction accuracy of large language models on mining gene relationships. In protein modeling, we utilized contrastive learning to develop a 3D structure-aware protein language model (S-PLM), which effectively incorporates 3D structure information into sequence-based embeddings. Additionally, we developed a de novo foundation model Prot2Token for multitasking protein prediction and design using an autoregressive language modeling method. We prompted PLMs to improve various protein prediction tasks, such as signal peptide and target signal prediction. We also applied prompt-based learning on large single-cell RNA-seq models, which enhanced the performance of several single-cell analysis tasks. The results of our research highlight the transformative capabilities of foundational models and prompt-based learning for a broad range of bioinformatics problems.