Show simple item record

dc.contributor.authorGhazimirsaeed, Seyedeh
dc.contributor.otherQueen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))en
dc.date.accessioned2019-04-02T20:32:43Z
dc.date.available2019-04-02T20:32:43Z
dc.identifier.urihttp://hdl.handle.net/1974/26078
dc.description.abstractMessage Passing Interface (MPI) is the de facto standard for communication in High Performance Computing (HPC). MPI Processes compute on their local data while extensively communicating with each other. Communication is therefore the major bottleneck for performance. This dissertation presents several proposals for improving the communication performance in MPI. Message matching is in the critical path of communications in MPI. Therefore, it has to be optimized given the scalability requirements of the HPC applications. We propose clustering-based message matching mechanisms as well as a partner/non-partner message queue design that consider the behavior of the applications to categorize the communicating peers into some groups, and assign dedicated queues to each group. The experimental evaluations show that the proposed approaches improve the queue search time and application runtime by up to 28x and 5x, respectively. We also propose a unified message matching mechanism that improves the message queue search time by distinguishing messages coming from point-to-point and collective communications. For collective elements, it dynamically profiles the impact of each collective call on message queues and uses this information to adapt the queue data structure. For point-to-point elements, it uses partner/non-partner queue design. The evaluation results show that we can improve the queue search time and application runtime by up to 80x and 5.5x, respectively. Furthermore, we consider the vectorization capabilities of used in new HPC systems many-core processors/coprocessors to improve the message matching performance. The evaluation results show that we can improve the queue search time and application runtime by up to 4.5x and 2.92x, respectively. Finally, we propose a collaborative communication mechanism based on common neighborhoods that might exist among groups of k processes. Such common neighborhoods are used to decrease the number of communication stages through message combining. We consider two design alternatives: topology-agnostic and topology-aware. The former ignores the physical topology of the system and the mapping of processes, whereas the latter takes them into account to further optimize the communication pattern. Our experimental results show that we can gain up to 8x and 5.2x improvement for various process topologies and a sparse matrix-matrix multiplication kernel, respectively.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesCanadian thesesen
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectMessage Passing Interfaceen_US
dc.subjectMessage Matchingen_US
dc.subjectNeighborhood Collectivesen_US
dc.titleHigh-performance Communication in MPI through Message Matching and Neighborhood Collective Designen_US
dc.typeThesisen
dc.description.degreeDoctor of Philosophyen_US
dc.contributor.supervisorAfsahi, Ahmad
dc.contributor.departmentElectrical and Computer Engineeringen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record