1.25 - 1.30 Opening remarks
1.30 - 2.00 Jack Dongarra (University of Tennessee, Oak Ridge National Laboratory, University of Manchester) Abstract, Slides.
2.00 - 2.30 Jos Martin (MathWorks) Abstract, Slides.
2.30 - 3.00 Craig Lucas (Numerical Algorithms Group Ltd) Abstract, Slides.
3.00 - 3.30 Coffee break
3.30 - 4.00 Dimitrios Nikolopoulos (Queen's University Belfast) Abstract, Slides.
4.00 - 4.30 Jonathan Hogg (Rutherford Appleton Laboratory) Abstract, Slides.
4.30 - 5.00 Yuji Nakatsukasa (University of Manchester) Abstract, Slides.
6:00 - Dinner at Tai Pan
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.
We will look at five areas of research that will have an importance impact in the development of software and algorithms.
We will focus on following themes:
• Redesign of software to fit multicore and hybrid architectures
• Automatically tuned application software
• Exploiting mixed precision for performance
• The importance of fault tolerance
• Communication avoiding algorithms
MATLAB is an environment in which many different users undertake computational tasks such as data analysis, simulation, and modelling. These users span many different academic disciplines and industries, and in general they are far more interested in getting their science or engineering task completed quickly than in considering the details of the parallel numerical algorithms being used. In this talk we will consider how to design a software system that presents these types of fast algorithm for broad user adoption. We will look at some of the design trade-offs that need to be made, consider the usability aspects of the system and describe the engineering methodologies we use to generate such a system.
In this talk we will consider how we can deliver state of the art parallel numerical algorithms in a robust commercial library. We look at how we might meet this challenge on current and emerging architectures including multicore processors, GPUs and the Intel MIC. We discuss some of our collaborations with academic software developers, the use of OpenMP and how future versions of this language may allow us to exploit much more parallelism in numerical linear algebra. We will ask the important question of how the user will interact with such libraries and whether it is desirable or even practical to hide the complexity of parallelism from them.
We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without having to restructure object or array allocation, and provides flexibility in array layouts and tile dimensions.
Sparse direct linear solvers are frequently used in both mathematical optimization and finite element methods, often on desktop machines rather than large clusters. While several well-established solvers exist, research avenues in parallel direct solvers are far from exhausted.
In particular, challenges remain relating to memory optimizations and the development of practical communication minimization techniques. Of particular interest is achieving strong scaling: good speedup for a fixed size of problem as the number of cores increases. Interior point methods in particular often solve hundreds of linear systems during their execution, and are expected to do so within a few seconds at most.
What we see as the main challenges will be summarised, and our current approach to tackling some of them will be presented.
Spectral divide and conquer algorithms solve the eigenvalue problem by recursively computing an invariant subspace for a subset of the spectrum and using it to decouple the problem into two smaller subproblems.
A number of such algorithms have been developed over the last forty years, often motivated by parallel computing and, most recently, with the aim of achieving minimal communication costs.
However, none of the existing algorithms has been proved to be backward stable, and they all have a significantly higher arithmetic cost than the standard algorithms currently used. We present new spectral divide and conquer algorithms for the symmetric eigenvalue problem and the singular value decomposition that are backward stable, achieve lower bounds on communication costs, and have operation counts within a small (<3) constant factor of those for the standard algorithms.
The new algorithms are built on the polar decomposition and use the building blocks of QR factorization and matrix multiplication. The algorithms have great potential for efficient, numerically stable computations on computing architectures where the cost of communication dominates the cost of arithmetic. Time permitting, I will discuss an extension to generalized eigenvalue problems.