Pattern Discovery in Protein Structures and Interaction Networks

Thumbnail Image
Ahmed, Hazem Radwan A.
3D Structural Motif Matching , Protein Structure Classification , Protein Structure Alignment , Protein Interaction Networks , Protein-Protein Interaction Prediction , Multi-Start Particle Swarm Optimization , Fitness-based Agile Restart , Efficient Stagnation Detection , Proteomic Pattern Matching and Discovery , Protein Contact Maps
Pattern discovery in protein structures is a fundamental task in computational biology, with important applications in protein structure prediction, profiling and alignment. We propose a novel approach for pattern discovery in protein structures using Particle Swarm-based flying windows over potentially promising regions of the search space. Using a heuristic search, based on Particle Swarm Optimization (PSO) is, however, easily trapped in local optima due to the sparse nature of the problem search space. Thus, we introduce a novel fitness-based stagnation detection technique that effectively and efficiently restarts the search process to escape potential local optima. The proposed fitness-based method significantly outperforms the commonly-used distance-based method when tested on eight classical and advanced (shifted/rotated) benchmark functions, as well as on two other applications for proteomic pattern matching and discovery. The main idea is to make use of the already-calculated fitness values of swarm particles, instead of their pairwise distance values, to predict an imminent stagnation situation. That is, the proposed fitness-based method does not require any computational overhead of repeatedly calculating pairwise distances between all particles at each iteration. Moreover, the fitness-based method is less dependent on the problem search space, compared with the distance-based method. The proposed pattern discovery algorithms are first applied to protein contact maps, which are the 2D compact representation of protein structures. Then, they are extended to work on actual protein 3D structures and interaction networks, offering a novel and low-cost approach to protein structure classification and interaction prediction. Concerning protein structure classification, the proposed PSO-based approach correctly distinguishes between the positive and negative examples in two protein datasets over 50 trials. As for protein interaction prediction, the proposed approach works effectively on complex, mostly sparse protein interaction networks, and predicts high-confidence protein-protein interactions — validated by more than one computational and experimental source — through knowledge transfer between topologically-similar interaction patterns of close proximity. Such encouraging results demonstrate that pattern discovery in protein structures and interaction networks are promising new applications of the fast-growing and far-reaching PSO algorithms, which is the main argument of this thesis.
External DOI