The role of protein embeddings for protein–protein interaction prediction with graph neural networks

Abstract

Graph neural networks are a natural fit for protein–protein interaction (PPI) prediction because they exploit the graph structure of interactomes, yet it remains unclear how much they benefit from protein-level representations and which sources of protein information matter most. We address this by surveying and organizing the field into a coherent taxonomy that describes three paradigms: network-level approaches (shallow and deep graph representation learning), protein-level approaches (sequence, structure, and function), and hybrid methods that combine both paradigms—providing a cohesive synthesis that highlights their respective strengths, complementarities, and tradeoffs.

We developed a unified framework to systematically compare these paradigms that incorporates multiple representative methods and evaluates them on two widely used benchmarks: OGBl-ppa and HuRI. Our results indicate that hybrid methods have improved performance, with function-driven representations emerging as the most informative protein-level representations for PPI prediction. This work provides a taxonomical map of the field, a reproducible framework for evaluating representative methods across paradigms, and practical guidance for future PPI prediction efforts.