Verifying OpenMP* Using Parallel Lint

To accelerate migration of sequential applications to parallel applications using OpenMP, parallel lint can be very helpful by reducing application development and debugging time. This topic explains how to use parallel lint to optimize your parallel application. Parallel lint performs static global analysis of a program to diagnose existing and potential issues with parallelization. One of the advantages of parallel lint is that it makes its checks considering the whole stack of parallel regions and worksharing constructs, even when placed in different routines.

Example

1 parameter (N = 100)

2 real, dimension(N) :: x,y

3

4 !$OMP PARALLEL DEFAULT(SHARED)

5 !$OMP SECTIONS

6 !$OMP SECTION

7 do i = 1, N

8 call work(x, N, i)

9 call output(x, N)

10 end do

11 !$OMP SECTION

12 call work(y, N, N)

13 call output(y, N)

14 !$OMP END SECTIONS

15 !$OMP END PARALLEL

16 print *, x, y

17 end

18

19

20 subroutine work(x, N, i)

21 real, dimension(N) :: x

22 x(i) = i*10.0

23 end subroutine work

24

25 subroutine output(x, N)

26 real, dimension(N) :: x

27 !$OMP SINGLE

28 print *, x

29 !$OMP END SINGLE

tst.f(27): error #12200: SINGLE directive is not allowed in

the dynamic extent of SECTIONS directive (file:tst.f line:5)

This makes parallel lint a powerful tool for diagnosing OpenMP directives in whole program context. Parallel lint also provides checks to debug errors connected with data dependencies and race conditions.

Example

1 parameter (N = 10)

2 integer i

3 integer, dimension(N) :: factorial

4

5 factorial(1) = 1

6 !$OMP PARALLEL DO

7 do i = 2, N

8 factorial(i) = i * factorial(i-1)

9 end do

10 print *, factorial

11 end

tst.f(8): warning #12246: flow data dependence from

(file:tst.f line:8) to (file:tst.f line:8), due to

"FACTORIAL" may lead to incorrect program execution in parallel mode

Basics of Compilation

To enable parallel lint analysis, pass the /Qdiag-enable:sc-parallel[n] (Windows), -diag-enable sc-parallel[n] (Linux and Mac OS) option to the compiler.

Parallel lint is available for IA-32 and Intel® 64 architectures only.

Parallel lint requires the OpenMP option, /Qopenmp (Windows) -openmp (Linux and Mac OS). This option forces the compiler to process OpenMP directives to make parallelization specifics available for parallel lint analysis. If parallel lint is used without OpenMP, the compiler issues the following error message:

command line error: parallel lint not called due to lack of OpenMP

parallelization option, please add option /Qopenmp when using parallel lint.

If you are using Microsoft Visual Studio*, you should create a separate build configuration devoted to parallel lint, since object and library files produced by parallel lint should not be used to build your product.

Basic Checks

Parallel lint provides a broad set of OpenMP checks which are useful both for beginners in parallel programming using OpenMP and for advanced parallel developers. See the Overview section of this manual.

The examples below highlight the most useful features of parallel lint.

Case 1: Nested Regions

An OpenMP program is much more difficult to debug if it has nested parallel regions. Various restrictions apply to nested parallel constructs. Parallel lint can check nested parallel statements even if they are located in different files.

In the example below, a worksharing construct may not be closely nested inside a WORKSHARING, CRITICAL, ORDERED, or MASTER construct.

Example

1 parameter (N = 10)

2 real, dimension(N,N) :: x, y, z

3 x = 1.0

4 y = 2.0

5 !$OMP PARALLEL DEFAULT(SHARED)

6 !$OMP MASTER

7 call work(x, y, z, N)

8 !$OMP END MASTER

9 !$OMP END PARALLEL

10 print *, z

11 end

12

13 subroutine work(x, y, z, N)

14 real, dimension(N,N) :: x, y, z

15 !$OMP DO

16 do i = 1, N

17 do j = 1, N

18 z(i,j) = x(i,j) + y(j,i)

19 end do

20 end do

21 end subroutine work

tst.f(15): error #12200: LOOP directive is not allowed in

the dynamic extent of MASTER directive (file:tst.f line:6)

Case 2: Data-Sharing Attribute Clauses

Parallelization of an existing serial application requires accurate placement of data sharing clauses. Parallel lint can help determine not only improper usage of sharing clauses but also lack of proper data sharing directives.

The example below demonstrates the OpenMP standard restriction: "If the LASTPRIVATE clause is used on a construct to which NOWAIT is also applied, then the original list item remains undefined until a barrier synchronization has been performed to ensure that the thread that executed the sequentially last iteration, or the lexically last SECTION construct, has stored that list item." [OpenMP standard]

Example

1 integer, parameter :: N=10

2 integer last, i

3 real, dimension(N) :: a, b, c

4 b = 10.0

5 c = 50.0

6 !$OMP PARALLEL SHARED(a, b, c, last)

7 !$OMP DO LASTPRIVATE(last)

8 do i = 1, N

9 a(i) = b(i) + c(i)

10 last = i

11 end do

12 !$OMP END DO NOWAIT

13 !$OMP SINGLE

14 call sub(last)

15 !$OMP END SINGLE

16 !$OMP END PARALLEL

17 end

18

19 subroutine sub(last)

20 integer last

21 print *, last

22 end subroutine sub

tst.f(14): error #12220: LASTPRIVATE variable "LAST" in NOWAIT

work-sharing construct is used before barrier synchronization

The next example demonstrates OpenMP standard restriction: "Private pointers that become allocated during the execution of parallel region should be explicitly deallocated by the program prior to the end of parallel region to avoid memory leaks."

Example

1 integer :: OMP_GET_THREAD_NUM

2 integer, pointer :: ptr

3 integer, pointer :: a(:)

4

5 call OMP_SET_NUM_THREADS(2)

6 allocate(ptr)

7 allocate(a(2))

8 ptr = 5

9 print *, ptr

10 !$OMP PARALLEL PRIVATE(ptr) SHARED(a)

11 allocate(ptr)

12 ptr = 3

13 !$OMP CRITICAL

14 a(OMP_GET_THREAD_NUM()+1) = ptr

15 !$OMP END CRITICAL

16 !$OMP END PARALLEL

17 print *, a

18 end

as_44_1.f(11): error #12359: private pointer "PTR" should be explicitly deallocated by the

program prior to the end of parallel region (file:as_44_1.f line:10) to avoid memory leaks.

Case 3: Data Dependence

Data dependency issues are very difficult to debug in parallel programs due to non-deterministic behavior. Parallel lint is able to determine data dependency issues in programs without executing them.

To turn on data dependency analysis you should specify severity level 3 parallel lint in diagnostics.

Example

1 integer i, a(4)

2 !$OMP PARALLEL DO SHARED(i) NUM_THREADS(4)

3 do i=1,4

4 a(i) = loc(i)

5 end do

6 !$OMP END PARALLEL DO

7 print *,a

8 end

tst.f(3): warning #12246: flow data dependence from

(file:tst.f line:3) to (file:tst.f line:3), due to "I"

may lead to incorrect program execution in parallel mode

Case 4: Treadprivate Variables

Example

1 integer a(1000)

2 !$OMP THREADPRIVATE(a)

3 integer i, sum

4

5 !$OMP PARALLEL DO

6 do i=1,1000

7 a(i) = i

8 end do

9 !$OMP END PARALLEL DO

10 !$OMP PARALLEL DO REDUCTION(+:sum)

11 do i=10,1000

12 sum = sum + a(i)

13 end do

14 !$OMP END PARALLEL DO

15 print *,sum

16 end

tst.f(12): error #12344: THREADPRIVATE variable "A"

is used in loops with different initial values. See

loops (file:tst.f line:6) and (file:tst.f line:11).

Case 5: Reductions

Reductions are widely used in parallel programming, but there are a lot of hidden and explicit restrictions. Parallel lint helps avoid potential problems connected to reductions. In this case explicit constraint from the OpenMP API, variables that appear in a REDUCTION clause must be SHARED in the enclosing context, is illustrated.

Example

1 integer i, j

2 real a

3

4 !$OMP PARALLEL PRIVATE(a)

5 do i = 1, 10

6 call sub(a,i)

7 end do

8 !$OMP SINGLE

9 print *, a

10 !$OMP END SINGLE

11 !$OMP END PARALLEL

12 end

13

14 subroutine sub(a,i)

15 integer i

16 real a

17 !$OMP DO REDUCTION(+: a)

18 do j = 1, 10

19 a = a + i + j

20 end do

21 end subroutine sub

as_35_1.f(17): error #12208: variable "A" must be SHARED in the enclosing

context since it is specified in a REDUCTION clause at (file:as_35_1.f line:4)