Virginia Tech® home

Probing Weaknesses in GPU Reliability Assessment: A Cross-Layer Approach

Abstract

Due to extensive deployment and heavy usage of GPUs, ensuring the reliability of such devices is crucial.

Current software-based reliability evaluation methodologies, albeit fast, often neglect the intricate hardware complexities of modern GPU designs.

This oversight could result in misleading measurements and misguided decisions regarding protection strategies.

This work breaks new ground by examining well-established vulnerability assessment methods for modern GPU architectures, from the microarchitecture all the way to the software layers.

It highlights divergences between popular software-based vulnerability evaluation methods and the ground truth crosslayer evaluation (which, as we show, holds even when strong protection like triple modular redundancy is employed); accurate evaluation requires considering fault distribution from hardware to software.

Our comprehensive measurements offer valuable insights into accurately assessing GPU reliability.

Key words: reliability assessment; GPUs; cross-layer; massive MIMO; passive jamming; physical-layer security

Authors