Show simple item record

dc.contributor.authorMirsadeghi, Seyed
dc.contributor.otherQueen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))en
dc.date.accessioned2017-08-01T19:28:36Z
dc.date.available2017-08-01T19:28:36Z
dc.identifier.urihttp://hdl.handle.net/1974/16812
dc.description.abstractHigh-Performance Computing (HPC) represents the flagship domain in providing high-end computing capabilities that play a critical role in helping humanity solve its hardest problems. Ranging from answering profound questions about the universe to finding a cure for cancer, HPC applications span nearly every aspect of our life. The impressive power of HPC systems comes mainly from the massive number of processors---in the order of millions---that they provide. The efficiency of communications among these processors is the main bottleneck in the overall performance of HPC systems. This dissertation presents new algorithms for improving the communication performance in HPC systems by exploiting the topology information. We propose a parallel topology- and routing-aware mapping heuristic and a refinement algorithm that improves the communication performance by achieving a lower congestion across the network links. Our experimental results with 4,096 processors show that the proposed approach can provide more than 60% improvement in various mapping metrics compared to an initial in-order mapping of processes. Communication time is also improved by up to 50%. We also propose four topology-aware mapping heuristics designed specifically for collective communications in the Message Passing Interface (MPI). The heuristics provide a better match between the collective communication algorithm and the physical topology of the system, and decrease the communication latency by up to 78%. Furthermore, we expand topology-aware communications into the scope of accelerated computing. Using accelerators---especially Graphics Processing Units (GPUs)---to speed up certain types of computations plays an increasingly important role in HPC. We present a unified framework for topology-aware process mapping and GPU assignment in multi-GPU systems. Our experimental results on two clusters with 64 GPUs show that the proposed approach improves communication performance by up to 91%. Finally, we present a novel distributed algorithm that uses the process topology information to design optimized communication schedules for MPI neighborhood collectives. The proposed algorithm finds the common neighborhoods in a distributed graph topology and exploits them as an opportunity to improve the communication performance through message combining. The optimized schedules reduce the communication latency of MPI neighborhood collectives by more than 50%.en_US
dc.language.isoenen_US
dc.relation.ispartofseriesCanadian thesesen
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectHPCen_US
dc.subjectTopologyen_US
dc.subjectMPIen_US
dc.subjectCollective Communicationsen_US
dc.subjectNeighborhood Collectivesen_US
dc.subjectGPUen_US
dc.titleImproving Communication Performance through Topology and Congestion Awareness in HPC Systemsen_US
dc.typethesisen_US
dc.description.degreeDoctor of Philosophyen_US
dc.contributor.supervisorAfsahi, Ahmaden
dc.contributor.departmentElectrical and Computer Engineeringen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record