To accelerate migration of sequential applications to parallel applications using OpenMP, parallel lint can be very helpful by reducing application development and debugging time. This topic explains how to use parallel lint to optimize your parallel application. Parallel lint performs static global analysis of a program to diagnose existing and potential issues with parallelization. One of the advantages of parallel lint is that it makes its checks considering the whole stack of parallel regions and worksharing constructs, even when placed in different routines.
Example |
---|
1 parameter (N = 100) 2 real, dimension(N) :: x,y 3 4 !$OMP PARALLEL DEFAULT(SHARED) 5 !$OMP SECTIONS 6 !$OMP SECTION 7 do i = 1, N 8 call work(x, N, i) 9 call output(x, N) 10 end do 11 !$OMP SECTION 12 call work(y, N, N) 13 call output(y, N) 14 !$OMP END SECTIONS 15 !$OMP END PARALLEL 16 print *, x, y 17 end 18 19 20 subroutine work(x, N, i) 21 real, dimension(N) :: x 22 x(i) = i*10.0 23 end subroutine work 24 25 subroutine output(x, N) 26 real, dimension(N) :: x 27 !$OMP SINGLE 28 print *, x 29 !$OMP END SINGLE tst.f(27): error #12200: SINGLE directive is not allowed in the dynamic extent of SECTIONS directive (file:tst.f line:5) |
This makes parallel lint a powerful tool for diagnosing OpenMP directives in whole program context. Parallel lint also provides checks to debug errors connected with data dependencies and race conditions.
Example |
---|
1 parameter (N = 10) 2 integer i 3 integer, dimension(N) :: factorial 4 5 factorial(1) = 1 6 !$OMP PARALLEL DO 7 do i = 2, N 8 factorial(i) = i * factorial(i-1) 9 end do 10 print *, factorial 11 end tst.f(8): warning #12246: flow data dependence from (file:tst.f line:8) to (file:tst.f line:8), due to "FACTORIAL" may lead to incorrect program execution in parallel mode |
To enable parallel lint analysis, pass the /Qdiag-enable:sc-parallel[n] (Windows), -diag-enable sc-parallel[n] (Linux and Mac OS) option to the compiler.
Parallel lint is available for IA-32 and Intel® 64 architectures only.
Parallel lint requires the OpenMP option, /Qopenmp (Windows) -openmp (Linux and Mac OS). This option forces the compiler to process OpenMP directives to make parallelization specifics available for parallel lint analysis. If parallel lint is used without OpenMP, the compiler issues the following error message:
command line error: parallel lint not called due to lack of OpenMP
parallelization option, please add option /Qopenmp when using parallel lint.
If you are using Microsoft Visual Studio*, you should create a separate build configuration devoted to parallel lint, since object and library files produced by parallel lint should not be used to build your product.
Parallel lint provides a broad set of OpenMP checks which are useful both for beginners in parallel programming using OpenMP and for advanced parallel developers. See the Overview section of this manual.
The examples below highlight the most useful features of parallel lint.
An OpenMP program is much more difficult to debug if it has nested parallel regions. Various restrictions apply to nested parallel constructs. Parallel lint can check nested parallel statements even if they are located in different files.
In the example below, a worksharing construct may not be closely nested inside a WORKSHARING, CRITICAL, ORDERED, or MASTER construct.
Example |
---|
1 parameter (N = 10) 2 real, dimension(N,N) :: x, y, z 3 x = 1.0 4 y = 2.0 5 !$OMP PARALLEL DEFAULT(SHARED) 6 !$OMP MASTER 7 call work(x, y, z, N) 8 !$OMP END MASTER 9 !$OMP END PARALLEL 10 print *, z 11 end 12 13 subroutine work(x, y, z, N) 14 real, dimension(N,N) :: x, y, z 15 !$OMP DO 16 do i = 1, N 17 do j = 1, N 18 z(i,j) = x(i,j) + y(j,i) 19 end do 20 end do 21 end subroutine work tst.f(15): error #12200: LOOP directive is not allowed in the dynamic extent of MASTER directive (file:tst.f line:6) |
Parallelization of an existing serial application requires accurate placement of data sharing clauses. Parallel lint can help determine not only improper usage of sharing clauses but also lack of proper data sharing directives.
The example below demonstrates the OpenMP standard restriction: "If the LASTPRIVATE clause is used on a construct to which NOWAIT is also applied, then the original list item remains undefined until a barrier synchronization has been performed to ensure that the thread that executed the sequentially last iteration, or the lexically last SECTION construct, has stored that list item." [OpenMP standard]
Example |
---|
1 integer, parameter :: N=10 2 integer last, i 3 real, dimension(N) :: a, b, c 4 b = 10.0 5 c = 50.0 6 !$OMP PARALLEL SHARED(a, b, c, last) 7 !$OMP DO LASTPRIVATE(last) 8 do i = 1, N 9 a(i) = b(i) + c(i) 10 last = i 11 end do 12 !$OMP END DO NOWAIT 13 !$OMP SINGLE 14 call sub(last) 15 !$OMP END SINGLE 16 !$OMP END PARALLEL 17 end 18 19 subroutine sub(last) 20 integer last 21 print *, last 22 end subroutine sub tst.f(14): error #12220: LASTPRIVATE variable "LAST" in NOWAIT work-sharing construct is used before barrier synchronization |
The next example demonstrates OpenMP standard restriction: "Private pointers that become allocated during the execution of parallel region should be explicitly deallocated by the program prior to the end of parallel region to avoid memory leaks."
Example |
---|
1 integer :: OMP_GET_THREAD_NUM 2 integer, pointer :: ptr 3 integer, pointer :: a(:) 4 5 call OMP_SET_NUM_THREADS(2) 6 allocate(ptr) 7 allocate(a(2)) 8 ptr = 5 9 print *, ptr 10 !$OMP PARALLEL PRIVATE(ptr) SHARED(a) 11 allocate(ptr) 12 ptr = 3 13 !$OMP CRITICAL 14 a(OMP_GET_THREAD_NUM()+1) = ptr 15 !$OMP END CRITICAL 16 !$OMP END PARALLEL 17 print *, a 18 end as_44_1.f(11): error #12359: private pointer "PTR" should be explicitly deallocated by the program prior to the end of parallel region (file:as_44_1.f line:10) to avoid memory leaks. |
Data dependency issues are very difficult to debug in parallel programs due to non-deterministic behavior. Parallel lint is able to determine data dependency issues in programs without executing them.
To turn on data dependency analysis you should specify severity level 3 parallel lint in diagnostics.
Example |
---|
1 integer i, a(4) 2 !$OMP PARALLEL DO SHARED(i) NUM_THREADS(4) 3 do i=1,4 4 a(i) = loc(i) 5 end do 6 !$OMP END PARALLEL DO 7 print *,a 8 end tst.f(3): warning #12246: flow data dependence from (file:tst.f line:3) to (file:tst.f line:3), due to "I" may lead to incorrect program execution in parallel mode |
Example |
---|
1 integer a(1000) 2 !$OMP THREADPRIVATE(a) 3 integer i, sum 4 5 !$OMP PARALLEL DO 6 do i=1,1000 7 a(i) = i 8 end do 9 !$OMP END PARALLEL DO 10 !$OMP PARALLEL DO REDUCTION(+:sum) 11 do i=10,1000 12 sum = sum + a(i) 13 end do 14 !$OMP END PARALLEL DO 15 print *,sum 16 end tst.f(12): error #12344: THREADPRIVATE variable "A" is used in loops with different initial values. See loops (file:tst.f line:6) and (file:tst.f line:11). |
Reductions are widely used in parallel programming, but there are a lot of hidden and explicit restrictions. Parallel lint helps avoid potential problems connected to reductions. In this case explicit constraint from the OpenMP API, variables that appear in a REDUCTION clause must be SHARED in the enclosing context, is illustrated.
Example |
---|
1 integer i, j 2 real a 3 4 !$OMP PARALLEL PRIVATE(a) 5 do i = 1, 10 6 call sub(a,i) 7 end do 8 !$OMP SINGLE 9 print *, a 10 !$OMP END SINGLE 11 !$OMP END PARALLEL 12 end 13 14 subroutine sub(a,i) 15 integer i 16 real a 17 !$OMP DO REDUCTION(+: a) 18 do j = 1, 10 19 a = a + i + j 20 end do 21 end subroutine sub as_35_1.f(17): error #12208: variable "A" must be SHARED in the enclosing context since it is specified in a REDUCTION clause at (file:as_35_1.f line:4) |