Overview of Parallelism Methods

The three major features of parallel programming supported by the Intel® compiler include:

Each of these features contributes to application performance depending on the number of processors, target architecture (IA-32, Intel® 64, and IA-64 architectures), and the nature of the application. These features of parallel programming can be combined to contribute to application performance.

Parallelism defined with the OpenMP* API is based on thread-level and task-level parallelism.  Parallelism defined with auto-parallelization techniques is based on thread-level parallelism (TLP).  Parallelism defined with auto-vectorization techniques is based on instruction-level parallelism (ILP).

Parallel programming can be explicit, that is, defined by a programmer using the OpenMP* API and associate options. Parallel programming can also be implicit, that is, detected automatically by the compiler. Implicit parallelism implements auto-parallelization of outer-most loops and auto-vectorization of innermost loops (or both).

To enhance the compilation of the code with auto-vectorization, users can also add vectorizer directives to their program.

Note

Software pipelining (SWP), a technique closely related to auto-vectorization, is available on systems based on IA-64 architecture.

The following table  summarizes the different ways in which parallelism can be exploited with the Intel® Compiler.

Parallelism Method

Supported On

Implicit (parallelism generated by the compiler and by user-supplied hints)

Auto-parallelization
(Thread-Level Parallelism)

 

  • IA-32 architecture, Intel® 64 architecture, IA-64 architecture based multi-processor systems, and multi-core processors

  • Hyper-Threading Technology-enabled systems

Auto-vectorization (Instruction-Level Parallelism)

  • Pentium®, Pentium with MMX™ Technology, Pentium II, Pentium III, Pentium 4 processors, Intel® Core™ processor, Intel® Core™ 2 processor, and Intel® Atom™ processor.

Explicit (parallelism programmed by the user)

OpenMP* (Thread-Level and Task-Level Parallelism)

  • IA-32 architecture, Intel® 64 architecture, IA-64 architecture-based multiprocessor systems, and multi-core processors

  • Hyper-Threading Technology-enabled systems

Intel provides performance libraries that contain highly optimized, extensively threaded routines, including the Intel® Math Kernel Library (Intel® MKL).

In addition to these major features supported by the Intel compiler, certain operating systems support application program interface (API) function calls that provide explicit threading controls. For example, Windows* operating systems support API calls such as CreateThread, and multiple operating systems support POSIX* threading APIs.

Performance Analysis

For performance analysis of your parallel program, you can use the Intel® VTune™ Performance Analyzer and/or the Intel® Threading Tools to show performance information. You can obtain detailed information about which portions of the code require the largest amount of time to execute and where parallel performance problems are located.

Threading Resources

For general information about threading an existing serial application or design considerations for creating new threaded applications, see Other Resources and the web site http://go-parallel.com.