Pilot video course "Parallel Programming and Optimization for Intel Xeon Phi Coprocessors"
Hi, Habr!
As stated in the title of this topic, I am actively working to create a training video course on parallel programming and code optimization for high-performance systems based on Intel architectures. Below is more information about this course, a list of topics covered and laboratory work, as well as pilot episodes that will give an idea of the content and format of this course.
In the current module, using the histogram construction example, two optimization techniques will be shown to improve the automatic vectorization of the code by the compiler and the performance results for the Intel Xeon CPU and Intel Xeon Phi coprocessor. ')
This course is being filmed for Intel in English, and will be used where I or my colleagues from the company cannot attend in person. The list of topics included in this video course is based on our one-day training. Slides for this course can be viewed at the following link: http://research.colfaxinternational.com/post/2014/10/13/CDT-Slides.aspx . When downloading a file with slides, the email and name are requested for internal statistics and are not included in mailings without the client's consent.
List of topics for training (in English)
Welcome
About This Document
Disclaimer
Intel Many Integrated Core (MIC) Architecture
Purpose of the Intel MIC Architecture
Details of the MIC Architecture
Software Tools for Intel Xeon Phi Coprocessors
Will My Application Benefit from the MIC architecture?
Models for Intel Xeon Phi Coprocessor Programming
Overview of Programming Options
Native Coprocessor Applications
Explicit Offload
Data and Memory Buffer Retention
Virtual-Shared Memory Offload Model
Handling Multiple Coprocessors
Heterogeneous Programming with Coprocessors using MPI
File I / O in MPI Applications on Coprocessors
Expressing Parallelism on Intel Architectures
SIMD Parallelism and Automatic Vectorization
Thread Parallelism and OpenMP
Thread Synchronization in OpenMP
Reduction Across Threads: Avoiding Synchronization
Distributed Memory Parallelism and MPI
Summary and Additional Resources
Optimization Using Intel Software Development Tools
Optimization Roadmap
Library Solution: Intel Math Kernel Library (MKL)
Node-Level Tuning with Intel VTune Amplifier XE
Cluster-Level Tuning with Intel Trace Analyzer and collector
Optimization of Scalar Arith cosmetics
Compiler-friendly Practices
Accuracy Control
Optimization of Vectorization
Diagnostics and Facilitation of Automatic Vectorization
Vector-friendly Data Structures
Data Alignment for Vectorization
Strip-Mining for Vectorization
Additional Vectorization `` Tuning Knobs ''
Optimization of Thread Parallelism
Reduction instead of Synchronization
Elimination of False Sharing
Expanding Iteration Space
Controlling Thread Affinity
Optimization of Data Traffic
Memory Access and Cache Utilization
PCIe Traffic Optimization in Offload Applications
MPI Traffic Optimization: Fabric Selection
Optimization of MPI Applications
Load Balancing in Heterogeneous Applications
Inter-Operation with OpenMP
Additional Resources
Course record
Knights Landing, the Next Manycore Architecture
Where to get more information
How to Obtain an Intel Xeon Phi Coprocessor
It is also planned to include laboratory works, in which, step by step, the stages of code optimization are shown with specific examples. A list of the names of these practical exercises is presented below.
Lab names (in English)
2.1-native
2.2-explicit-offload
2.3-explicit-offload-persistence
2.4-explicit-offload-matrix
2.5-sharing-complex-objects
2.6-multiple-coprocessors
2.7-asynchronous-offload
2.8-MPI
2.9-openmp4.0
3.1-vectorization
3.2 OpenMP
3.3-Cilk-Plus
3.4-MPI
4.1-vtune
4.2-itac
4.3-serial-optimization
4.4-vectorization-data-structure
4.5-vectorization-compiler-hints
4.6-optimize-shared-mutexes
4.7-optimize-scheduling
4.8-insufficient-parallelism
4.9-affinity
4.a-tiling
4.b-Nbody
4.c-cache-oblivious-recursion
4.d-cache-loop-fusion
4.e-offload
4.f-MPI-load-balance
4.g-hybrid
4.h-MKL
Work on recording and editing has just begun. Therefore, I would very much like to know Habr's opinion on the questions presented below. For me, the Russian translation of only one audio track of a 10 minute episode, and there will be 50-60 of them, is a few hours of work. So I would like to know in advance if my idea has value for Habr's visitors. So any constructive criticism of the content / presentation or just a comment is welcome.