19th Workshop on high performance computing in meteorology

Optimization of ACRANEB2 radiation kernel on Intel Xeon Processors.

Speaker

Vamsi Sripathi (Intel)

Description

In the talk, we describe the optimization techniques employed to accelerate the performance of a radiation kernel (ACRANEB2) on Intel Xeon Processors. We briefly describe the compute capabilities of latest Intel Processors and implications on HPC application performance. In our performance analysis of ACRANEB2, we have identified two top hot regions involving mathematical functions and prefix-sum calculations. Date dependency in prefix-sum calculations makes it harder for the Compliers to vectorize the computations and leads to poor out-of-box performance. Hence, to effectively use the AVX512 vectors units, we developed an optimized version of prefix-sum using explicit vectorization techniques which will be discussed in detail in the talk. Our performance results show that the average speed-up of the explicit SIMD prefix-sum over the baseline and OpenMP SIMD implementations is 4.6x (GCC and Clang) and 1.6x (ICC), respectively. For better integration into ACRANEB2, we extended the standalone prefix-sum to a packed implementation that operates on a set of input vectors. In addition, we also block and fuse prefix-sum calculations with math operations to maximize the gains from AVX512 vectorization. Overall, this led to performance gains of up to 1.3x over baseline for ACRANEB2 on latest Intel Xeon Processors.

Primary author

Vamsi Sripathi (Intel)

Co-authors

Nitya Hariharan (Intel) Ruchira Sasanka (Intel)

Presentation materials