Improving Communication Performance through Topology and Congestion Awareness in HPC Systems

Loading...
Thumbnail Image

Authors

Mirsadeghi, Seyed

Date

Type

thesis

Language

eng

Keyword

HPC , Topology , MPI , Collective Communications , Neighborhood Collectives , GPU

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

High-Performance Computing (HPC) represents the flagship domain in providing high-end computing capabilities that play a critical role in helping humanity solve its hardest problems. Ranging from answering profound questions about the universe to finding a cure for cancer, HPC applications span nearly every aspect of our life. The impressive power of HPC systems comes mainly from the massive number of processors---in the order of millions---that they provide. The efficiency of communications among these processors is the main bottleneck in the overall performance of HPC systems. This dissertation presents new algorithms for improving the communication performance in HPC systems by exploiting the topology information. We propose a parallel topology- and routing-aware mapping heuristic and a refinement algorithm that improves the communication performance by achieving a lower congestion across the network links. Our experimental results with 4,096 processors show that the proposed approach can provide more than 60% improvement in various mapping metrics compared to an initial in-order mapping of processes. Communication time is also improved by up to 50%. We also propose four topology-aware mapping heuristics designed specifically for collective communications in the Message Passing Interface (MPI). The heuristics provide a better match between the collective communication algorithm and the physical topology of the system, and decrease the communication latency by up to 78%. Furthermore, we expand topology-aware communications into the scope of accelerated computing. Using accelerators---especially Graphics Processing Units (GPUs)---to speed up certain types of computations plays an increasingly important role in HPC. We present a unified framework for topology-aware process mapping and GPU assignment in multi-GPU systems. Our experimental results on two clusters with 64 GPUs show that the proposed approach improves communication performance by up to 91%. Finally, we present a novel distributed algorithm that uses the process topology information to design optimized communication schedules for MPI neighborhood collectives. The proposed algorithm finds the common neighborhoods in a distributed graph topology and exploits them as an opportunity to improve the communication performance through message combining. The optimized schedules reduce the communication latency of MPI neighborhood collectives by more than 50%.

Description

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN