Verifying OpenMP* Using Parallel Lint

To accelerate migration of sequential applications to parallel applications using OpenMP, parallel lint can be very helpful by reducing application development and debugging time. This topic explains how to use parallel lint to optimize your parallel application. Parallel lint performs static global analysis of a program to diagnose existing and potential issues with parallelization. One of the advantages of parallel lint is that it makes its checks considering the whole stack of parallel regions and worksharing constructs, even when placed in different routines.

Example

1 //**************************************************

2 //* for, sections, single pragmas *

3 //* that bind to the same parallel pragma are *

4 //* not allowed to be nested one inside the other *

5 //* *

6 //**************************************************

7

8 #include <stdio.h>

9 #include <omp.h>

10

11 void fff(int ii) {

12 printf("We've got i=%d NTR=%d\n",ii, omp_get_thread_num() );

13 }

14

15 void sec2(int i){

16 #pragma omp single

17 fff(i+2);

18 }

19

20 int main(int n) {

21 int i=3;

22 omp_set_num_threads(3);

23 #pragma omp parallel

24 #pragma omp sections

25 {

26 #pragma omp sections

27 sec2(i);

28 }

29 return 0;

30 }

as_12_01.cpp(16): error #12200: single pragma is not allowed in

the dynamic extent of sections pragma (file:as_12_01.cpp line:24)

This makes parallel lint a powerful tool for diagnosing OpenMP pragmas in whole program context. Parallel lint also provides checks to debug errors connected with data dependencies and race conditions.

Example

1 #include <stdio.h>

2 #include <omp.h>

3

4 int main(void)

5 {

6 int i;

7 int factorial[10];

8

9 factorial[0]=1;

10 #pragma omp parallel for

11 for (i=1; i < 10; i++) {

12

13 factorial[i] = i * factorial[i-1];

14 }

15

16 return 0;

17 }

omp.c(13): warning #12246: flow data dependence from (file:omp.c line:13) to (file:omp.c line:13),

due to "factorial" may lead to incorrect program execution in parallel mode

Basics of Compilation

To enable parallel lint analysis, pass the /Qdiag-enable:sc-parallel[n] (Windows), -diag-enable sc-parallel[n] (Linux and Mac OS) option to the compiler.

Parallel lint is available for IA-32 and Intel® 64 architectures only.

Parallel lint requires the OpenMP option, /Qopenmp (Windows) -openmp (Linux and Mac OS). This option forces the compiler to process OpenMP pragmas to make parallelization specifics available for parallel lint analysis. If parallel lint is used without OpenMP, the compiler issues the following error message:

command line error: parallel lint not called due to lack of OpenMP

parallelization option, please add option /Qopenmp when using parallel lint.

If you are using Microsoft Visual Studio*, you should create a separate build configuration devoted to parallel lint, since object and library files produced by parallel lint should not be used to build your product.

Basic Checks

Parallel lint provides a broad set of OpenMP checks which are useful both for beginners in parallel programming using OpenMP and for advanced parallel developers. See the Overview section of this manual.

The examples below highlight the most useful features of parallel lint.

Case 1: Nested Regions

An OpenMP program is much more difficult to debug if it has nested parallel regions. Various restrictions apply to nested parallel constructs. Parallel lint can check nested parallel statements even if they are located in different files.

In the example below, a worksharing construct may not be closely nested inside a worksharing, critical, ordered, or master construct.

Example

1 #include <stdio.h>

2 #include <omp.h>

3

4 int fff(int ii)

5 {

6 int rez;

7

8 #pragma omp sections

9 {

10 rez = ii;

11 #pragma omp section

12 rez = ii+2;

13 }

14 return rez;

15 }

16

17

18 int

19 main(int n)

20 {

21 int i;

22

23 omp_set_num_threads(3);

24

25 #pragma omp parallel

26 #pragma omp for ordered

27 for(i=1; i<150; i=i+2) {

28 fff(i);

29 #pragma omp ordered

30 if(i < 50 || i > 52) {

31 printf("i=%d NU=%d \n", i, omp_get_thread_num() );

32 }

33 }

34 return 0;

35 }

omp.c(8): error #12200: sections pragma is not allowed in

the dynamic extent of loop pragma (file:omp.c line:26)

Case 2: Data-Sharing Attribute Clauses

Parallelization of an existing serial application requires accurate placement of data sharing clauses. Parallel lint can help determine not only improper usage of sharing clauses but also lack of proper data sharing pragmas.

The example below demonstrates the OpenMP standard restriction: "If the lastprivate clause is used on a construct to which nowait is also applied, then the original list item remains undefined until a barrier synchronization has been performed to ensure that the thread that executed the sequentially last iteration, or the lexically last SECTION construct, has stored that list item." [OpenMP standard]

Example

1 #include <stdio.h>

2 #include <omp.h>

3

4 int main(void) {

5 int last, i;

6 float a[10], b[10];

7

8 for (i=0; i < 10; i++) {

9 b[i] = i*0.5;

10 }

11

12 #pragma omp parallel shared(a,b,last)

13 {

14 #pragma omp for lastprivate(last) nowait

15 for (i=0; i < 10; i++) {

16 a[i] = b[i] * 2;

17 last = a[i];

18 }

19 #pragma omp single

20 printf("%d\n", last);

21 }

22

23 return 0;

24 }

omp.c(20): error #12220: lastprivate variable "last" in

nowait work-sharing construct is used before barrier synchronization

Case 3: Data Dependence

Data dependency issues are very difficult to debug in parallel programs due to non-deterministic behavior. Parallel lint is able to determine data dependency issues in programs without executing them.

To turn on data dependency analysis you should specify severity level 3 parallel lint in diagnostics.

Example

1 #include <stdio.h>

2 #include <omp.h>

3

4 int main(void)

5 {

6 int i;

7 float a[100];

8

9 #pragma omp parallel for

10 for (i=0; i < 100; i++) {

11 a[i] = i*0.66;

12 }

13

14 #pragma omp parallel for

15 for (i=1; i < 100; i++) {

16 a[i] = a[i-1]*0.5 + a[i]*0.5;

17 }

18

19 return 0;

20 }

omp.c(16): warning #12246: flow data dependence from (file:omp.c line:16) to

(file:omp.c line:16), due to "a" may lead to incorrect program execution in parallel mode

Case 4: Treadprivate Variables

Example

1 #include <stdio.h>

2 #include <omp.h>

3

4 int a[1000];

5 #pragma omp threadprivate (a)

6

7 int main(int n) {

8 int i;

9 int sum =0;

10

11 #pragma omp parallel for

12 for (i=0; i < 1000; i++) {

13 a[i] = i;

14 }

15 #pragma omp parallel for reduction (+:sum)

16 for (i=10; i < 1000; i++) { // inconsistent init value

17 sum = sum + a[i];

18 }

19 printf("%d\n",sum);

20 return 0;

21 }

omp.cpp(17): error #12344: threadprivate variable "a" is used in loops with

different initial values. See loops (file:omp.cpp line:12) and (file:omp.cpp line:16).