Performance Optimization and Parallelization of Turbo Decoding for Software-Defined Radio

Loading...
Thumbnail Image

Authors

Roth, Jonathan

Date

2009-09-26T21:36:43Z

Type

thesis

Language

eng

Keyword

Parallel Algorithm , Turbo Decoder , Software-Defined Radio , Hardware Simulation

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Research indicates that multiprocessor-based architectures will provide a flexible alternative to hard-wired application-specific integrated circuits (ASICs) suitable to implement the multitude of wireless standards required by mobile devices, while meeting their strict area and power requirements. This shift in design philosophy has led to the software-defined radio (SDR) paradigm, where a significant portion of a wireless standard's physical layer is implemented in software, allowing multiple standards to share a common architecture. Turbo codes offer excellent error-correcting performance, however, turbo decoders are one of the most computationally complex baseband tasks of a wireless receiver. Next generation wireless standards such as Worldwide Interoperability for Microwave Access (WiMAX), support enhanced double-binary turbo codes, which offer even better performance than the original binary turbo codes, at the expense of additional complexity. Hence, the design of efficient double-binary turbo decoder software is required to support wireless standards in a SDR environment. This thesis describes the optimization, parallelization, and simulated performance of a software double-binary turbo decoder implementation supporting the WiMAX standard suitable for SDR. An adapted turbo decoder is implemented in the C language, and numerous software optimizations are applied to reduce its overall computationally complexity. Evaluation of the software optimizations demonstrated a combined improvement of at least 270% for serial execution, while maintaining good bit-error rate (BER) performance. Using a customized multiprocessor simulator, special instruction support is implemented to speed up commonly performed turbo decoder operations, and is shown to improve decoder performance by 29% to 40%. The development of a flexible parallel decoding algorithm is detailed, with multiprocessor simulations demonstrating a speedup of 10.8 using twelve processors, while maintaining good parallel efficiency (above 89%). A linear-log-MAP decoder implementation using four iterations was shown to have 90% greater throughput than a max-log-MAP decoder implementation using eight iterations, with comparable BER performance. Simulation also shows that multiprocessor cache effects do not have a significant impact on parallel execution times. An initial investigation into the use of vector processing to further enhance performance of the parallel decoder software reveals promising results.

Description

Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2009-09-25 16:22:47.288

Citation

Publisher

License

This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN