← All publications

Nature Biotechnology 2025 2025-10

A trimodal protein language model enables advanced protein searches

Published in Nature Biotechnology, 2025

Trimodal contrastive model unifying sequence, structure, and function text for billion-scale protein search.

Illustration for ProTrek trimodal protein search

ProTrek unifies protein sequence, structure, and natural language function in a trimodal language model through contrastive learning, enabling comprehensive searches between any two modalities, including within modality. ProTrek surpasses current alignment tools (e.g., Foldseek and MMseqs2) in speed and accuracy for identifying functionally related proteins. Computational and wet-lab validations show that the ProTrek server, with precomputed embeddings for over 5 billion proteins, efficiently processes large-scale protein repositories.

Citation

J Su, Y He, S You, S Jiang, X Zhou, X Zhang, Y Wang, et al. (2025). "A trimodal protein language model enables advanced protein searches." Nature Biotechnology.