High-performance Communication in MPI through Message Matching and Neighborhood Collective Design

Loading...
Thumbnail Image

Authors

Ghazimirsaeed, Seyedeh

Date

Type

thesis

Language

eng

Keyword

Message Passing Interface , Message Matching , Neighborhood Collectives

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Message Passing Interface (MPI) is the de facto standard for communication in High Performance Computing (HPC). MPI Processes compute on their local data while extensively communicating with each other. Communication is therefore the major bottleneck for performance. This dissertation presents several proposals for improving the communication performance in MPI. Message matching is in the critical path of communications in MPI. Therefore, it has to be optimized given the scalability requirements of the HPC applications. We propose clustering-based message matching mechanisms as well as a partner/non-partner message queue design that consider the behavior of the applications to categorize the communicating peers into some groups, and assign dedicated queues to each group. The experimental evaluations show that the proposed approaches improve the queue search time and application runtime by up to 28x and 5x, respectively. We also propose a unified message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications. For collective elements, it dynamically profiles the impact of each collective call on message queues and uses this information to adapt the queue data structure. For point-to-point elements, it uses partner/non-partner queue design. The evaluation results show that we can improve the queue search time and application runtime by up to 80x and 5.5x, respectively. Furthermore, we consider the vectorization capabilities of used in new HPC systems many-core processors/coprocessors to improve the message matching performance. The evaluation results show that we can improve the queue search time and application runtime by up to 4.5x and 2.92x, respectively. Finally, we propose a collaborative communication mechanism based on common neighborhoods that might exist among groups of k processes. Such common neighborhoods are used to decrease the number of communication stages through message combining. We consider two design alternatives: topology-agnostic and topology-aware. The former ignores the physical topology of the system and the mapping of processes, whereas the latter takes them into account to further optimize the communication pattern. Our experimental results show that we can gain up to 8x and 5.2x improvement for various process topologies and a sparse matrix-matrix multiplication kernel, respectively.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN