Show simple item record

dc.contributor.authorInozemtsev, Grigorien
dc.date2014-05-29 11:55:53.87
dc.date.accessioned2014-05-30T15:32:14Z
dc.date.available2014-05-30T15:32:14Z
dc.date.issued2014-05-30
dc.identifier.urihttp://hdl.handle.net/1974/12215
dc.descriptionThesis (Master, Electrical & Computer Engineering) -- Queen's University, 2014-05-29 11:55:53.87en
dc.description.abstractAs the demands of computational science and engineering simulations increase, the size and capabilities of High Performance Computing (HPC) clusters are also expected to grow. Consequently, the software providing the application programming abstractions for the clusters must adapt to meet these demands. Specifically, the increased cost of interprocessor synchronization and communication in larger systems must be accommodated. Non-blocking operations that allow communication latency to be hidden by overlapping it with computation have been proposed to mitigate this problem. In this work, we investigate offloading a portion of the communication processing to dedicated hardware in order to support communication/computation overlap efficiently. We work with the Message Passing Interface (MPI), the de facto standard for parallel programming in HPC environments. We investigate both point-to-point non-blocking communication and collective operations; our work with collectives focuses on the allgather operation. We develop designs for both flat and hierarchical cluster topologies and examine both eager and rendezvous communication protocols. We also develop a generalized primitive operation with the aim of simplifying further research into non-blocking collectives. We propose a new algorithm for the non-blocking allgather collective and implement it using this primitive. The algorithm has constant resource usage even when executing multiple operations simultaneously. We implemented these designs using CORE-Direct offloading support in Mellanox InfiniBand adapters. We present an evaluation of the designs using microbenchmarks and an application kernel that shows that offloaded non-blocking communication operations can provide latency that is comparable to that of their blocking counterparts while allowing most of the duration of the communication to be overlapped with computation and remaining resilient to process arrival and scheduling variations.en
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectMPIen
dc.subjectoffloadingen
dc.subjecthigh performance computingen
dc.subjectcomputer engineeringen
dc.subjectInfiniBanden
dc.subjectCORE-Directen
dc.titleOverlapping Computation and Communication through Offloading in MPI over InfiniBanden
dc.typethesisen
dc.description.degreeM.A.Sc.en
dc.contributor.supervisorAfsahi, Ahmaden
dc.contributor.departmentElectrical and Computer Engineeringen
dc.degree.grantorQueen's University at Kingstonen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record