HPC Institute, University of Tennessee - Project and Hardware

posted Friday, October 20, 2006 4:56 PM by DennisCr | 0 Comments

High Performance Compute Clustering with Windows

University of Tennessee

Innovative Computing Laboratory

Computer Science Department

Jack Dongarra

Windows Cluster Project

People

Jack Dongarra

George Bosilca

Dave Cronk

Julien Langou

Piotr Luszczek

Projects:

1. Numerical Linear Algebra Algorithms and Software

a. LAPACK, ScaLAPACK, ATLAS

b. Self Adapting Numerical Algorithms (SANS) Effort

c. Generic Code Optimization

d. LAPACK For Clusters – easy access to clusters

2. Heterogeneous Distributed Computing

a. NetSolve, FT-MPI, Open-MPI

3. Performance Evaluation

a. PAPI, HPC Challenge, Top500

4. Software Repositories

a. Netlib

LAPACK

1. Used by Matlab, Mathematica, Numeric Python,…

2. Tuned version provided by vendors: AMD, Apple, Compaq, Cray, Fujitsu, Hewlett-Packard, Hitachi, IBM, Intel, MathWorks, NAG, NEC, PGI, SUN, Visual Numerics, by Microsoft and most of Linux distribution (Fedora, Debian, Cygwin,...).

3. On going work: performance, accuracy, extended precision, ease of use

ScaLAPACK

1. Parallel implementation of LAPACK scaling on parallel hardware from 10’s to 100’s to 1000’s of processors

2. On going work: Match functionalities of current LAPACK

3. On going work: Target new architectures, new parallel environment. For example port to Microsoft HPC cluster solution

LAPACK for Clusters (LFC)

1. Most of ScaLAPACK functionality from serial clients (Matlab, Python, Mathematica)

FT-MPI and Open-MPI

1. Define the behavior of MPI in event a failure occurs at the process level.

2. FT-MPI based on MPI 1.3 (plus some MPI 2 features) with a fault tolerant model similar to what was done in PVM.

3. Complete reimplementation, not based on other implementations.

a. Gives the application the possibility to recover from a process-failure.

b. A regular, non fault-tolerant MPI program will run using FT-MPI.

c. What FT-MPI does not do:

4. Recover user data (e.g. automatic check-pointing)

5. Provide transparent fault-tolerance

Performance Application Programming Interface (PAPI)

1. A portable library to access hardware counters found on processors

2. Provides a standardized list of performance metrics

KOJAK (Joint with Felix Wolf)

1. Software package for the automatic performance analysis of parallel apps

2. Message passing and multi-threading (MPI and/or OpenMP)

3. Parallel performance

4. CPU and memory performance

Posters for Related Projects

·          FT-MPI

·          HPCC

·          Kojak

·          LAPACK / ScaLAPACK

·          NetSolve / ActiveSheets

·          NetSolve / .NET

·          Open MPI

·          PAPI

·          top500