High-performance Communication in MPI through Message Matching and Neighborhood Collective Design
Loading...
Authors
Ghazimirsaeed, Seyedeh
Date
Type
thesis
Language
eng
Keyword
Message Passing Interface , Message Matching , Neighborhood Collectives
Alternative Title
Abstract
Message Passing Interface (MPI) is the de facto standard for communication in High Performance Computing (HPC). MPI Processes compute on their local data while extensively communicating with each other. Communication is therefore the major bottleneck for performance. This dissertation presents several proposals for improving the communication
performance in MPI.
Message matching is in the critical path of communications in MPI. Therefore, it has to be optimized given the scalability requirements of the HPC applications. We propose clustering-based message matching mechanisms as well as a partner/non-partner message queue design that consider the behavior of the applications to categorize the communicating
peers into some groups, and assign dedicated queues to each group. The experimental evaluations show that the proposed approaches improve the queue search time and application runtime by up to 28x and 5x, respectively.
We also propose a unified message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications. For collective elements, it dynamically profiles the impact of each collective call on message queues and uses this information to adapt the queue data structure. For
point-to-point elements, it uses partner/non-partner queue design. The evaluation results show that we can improve the queue search time and application runtime by up to 80x and 5.5x, respectively.
Furthermore, we consider the vectorization capabilities of used in new HPC systems many-core processors/coprocessors to improve the message matching performance. The evaluation results show that we can improve the queue search time and application runtime by up to 4.5x and 2.92x, respectively.
Finally, we propose a collaborative communication mechanism based on common neighborhoods that might exist among groups of k processes. Such common neighborhoods are used to decrease the number of communication stages through message combining. We consider two design alternatives: topology-agnostic and topology-aware. The former ignores
the physical topology of the system and the mapping of processes, whereas the latter takes them into account to further optimize the communication pattern. Our experimental results show that we can gain up to 8x and 5.2x improvement for various process topologies and a sparse matrix-matrix multiplication kernel, respectively.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.