Loading, please wait...

product description page

Fault-tolerance Techniques for High-performance Computing (Reprint) (Paperback)

Fault-tolerance Techniques for High-performance Computing (Reprint) (Paperback) - image 1 of 1

About this item

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Edition: Reprint
Number of Pages: 329
Genre: Computers + Internet
Series Title: Computer Communications and Networks
Format: Paperback
Publisher: Springer Verlag
Language: English
Street Date: November 4, 2016
TCIN: 51969391
UPC: 9783319355603
Item Number (DPCI): 248-37-6335
If the item details above aren’t accurate or complete, we want to know about it. Report incorrect product info.

Guest reviews

Prices, promotions, styles and availability may vary by store & online. See our price match guarantee. See how a store is chosen for you.