Nature Biotechnology 2025 2025-10
A trimodal protein language model enables advanced protein searches
Published in Nature Biotechnology, 2025
Trimodal contrastive model unifying sequence, structure, and function text for billion-scale protein search.
ProTrek unifies protein sequence, structure, and natural language function in a trimodal language model through contrastive learning, enabling comprehensive searches between any two modalities, including within modality. ProTrek surpasses current alignment tools (e.g., Foldseek and MMseqs2) in speed and accuracy for identifying functionally related proteins. Computational and wet-lab validations show that the ProTrek server, with precomputed embeddings for over 5 billion proteins, efficiently processes large-scale protein repositories.
Citation
J Su, Y He, S You, S Jiang, X Zhou, X Zhang, Y Wang, et al. (2025). "A trimodal protein language model enables advanced protein searches." Nature Biotechnology.