Tutorial: A practical approach to performance analysis and modeling of large-scale systems

To be held at the 11th International Symposium on High-Performance Computer Architecture HPCA-11, in San Francisco.

Saturday Afternoon, February 12th, 2005.


Tutorial Organizers

Adolfy Hoisie and Darren Kerbyson
Los Alamos National Laboratory


Goals

This tutorial is aimed directly at making performance modeling accessible to both beginners and those with experience in application performance analysis on large systems. The approach is practical in nature, requiring the analysis of systems and of applications.

In the tutorial, an individual will receive: an understanding of system performance issues and characteristics that are key in influencing performance; the characteristics of an application that are key in influencing performance; and an approach that can be taken from the tutorial and applied in an individual’s own scenario. The tutorial does not require the utilization, or learning of new tools, but rather is complementary to the use of any specific tool. That is it will allow an individual to apply the approach in his/her own surrounding with their preferred performance tool environment.


Target Audience

This tutorial is intended for a mixture of computational scientists, computer scientists, and code developers interested in understanding the observed performance when using “real-life” applications on high performance systems. By carefully defining terms and metrics there should be no barriers associated with the diverse audience and an in-depth understanding of the issues will be provided which will be relevant to all backgrounds. The tutorial will also be of interest to those trying to define needs for future-generation, high-end computing systems from both a procurer’s or designer’s point of view.

Content Level: 30% Beginner, 50% Intermediate, 20% Advanced.

Description

This tutorial will present an integrated approach to the modeling of application performance in a system independent manner. This unique approach to analysis and modeling, developed over the last few years in the Performance and Architecture Laboratory (PAL) at Los Alamos, has been used and refined extensively. It has been highly successful in the modeling of many large-scale applications on a range of tera-scale systems. It can be applied to systems or applications that exist, or to those that are under design or being proposed.

The overarching goal is to understand the expected performance of a particular algorithm or application when mapped onto a given HPC platform. Performance modeling is the only technique that can quantitatively elucidate this mapping. Through this tutorial it will be shown how performance modeling can be used to provide insight in such key areas as:

  • estimating accurately the overall workload performance that can be expected from a prospective new computer system;
  • distinguish between system ‘glitches’ as opposed to true application performance issues;
  • accurately identifying the performance bottlenecks in existing systems;
  • providing a tuning “roadmap” to application developers; and
  • enabling “point-design” studies for computer architects designing new systems.

It will be shown how an analytically based modeling approach can be used to explore the performance of different architectural scenarios with reasonable accuracy and time constraints. This approach does not require prior knowledge of specific tools, or lengthy simulation or evaluation processes. Analytical based performance prediction has been shown to be a valuable tool in successfully providing performance expectations on many of the compute intensive applications in the ASCI workload.

The tutorial encompasses important definitions for analyzing performance, and also rigorous performance metrics for both serial and parallel considerations. The main content of the tutorial will be split between two aspects:

System characteristics
This includes the computational capability of a single processing node – the CPU, its functional capability, the memory hierarchy (cache configuration, memory bus speeds etc.), node configuration (PEs per node, shared resources etc.), and inter-processor communication (latency, bandwidth, topology, contention effects).
Workload Characteristics
This includes the resources that are used by the applications, their frequency, their potential for resource contention, scalability effects etc.

The approach will be exemplified throughout the tutorial by the use of real world applications. We will not emphasize any particular machine but use examples including Compaq Alpha HPC systems, Blue-Gene/L, and cluster systems. In particular detailed case studies will be given based on real experiences from large-scale applications. Applications included will cover a large spectrum of different performance characteristics: a structured grid application (Sweep3D), an adaptive mesh (AMR) application (SAGE), and an ocean modeling application (POP). The formation of models of these codes will be detailed along with techniques that can be applied to identify and understand relevant performance issues. Importantly, the value of the performance modeling approach will be illustrated through the use of the models in the following ways:

  • Prediction of new system configurations (using the Blue-Gene/L as an example)
  • Verification of achieved system performance (experiences from the installation of ASCI Q).
  • Large-Scale system comparison (Earth Simulator vs. other Terascale systems)
  • Prediction of possible future systems (100TF systems and beyond)


Tutorial Contact Information

Darren Kerbyson
CCS-3, MS B256
P.O. Box 1663
Los Alamos
NM 87545
Tel: +1 (505) 667-4913
Fax: +1 (505) 667-1126
Email: djk@lanl.gov