To accelerate migration of sequential applications to parallel applications using OpenMP, parallel lint can be very helpful by reducing application development and debugging time. This topic explains how to use parallel lint to optimize your parallel application. Parallel lint performs static global analysis of a program to diagnose existing and potential issues with parallelization. One of the advantages of parallel lint is that it makes its checks considering the whole stack of parallel regions and worksharing constructs, even when placed in different routines.
Example |
---|
1 //************************************************** 2 //* for, sections, single pragmas * 3 //* that bind to the same parallel pragma are * 4 //* not allowed to be nested one inside the other * 5 //* * 6 //************************************************** 7 8 #include <stdio.h> 9 #include <omp.h> 10 11 void fff(int ii) { 12 printf("We've got i=%d NTR=%d\n",ii, omp_get_thread_num() ); 13 } 14 15 void sec2(int i){ 16 #pragma omp single 17 fff(i+2); 18 } 19 20 int main(int n) { 21 int i=3; 22 omp_set_num_threads(3); 23 #pragma omp parallel 24 #pragma omp sections 25 { 26 #pragma omp sections 27 sec2(i); 28 } 29 return 0; 30 } as_12_01.cpp(16): error #12200: single pragma is not allowed in the dynamic extent of sections pragma (file:as_12_01.cpp line:24) |
This makes parallel lint a powerful tool for diagnosing OpenMP pragmas in whole program context. Parallel lint also provides checks to debug errors connected with data dependencies and race conditions.
Example |
---|
1 #include <stdio.h> 2 #include <omp.h> 3 4 int main(void) 5 { 6 int i; 7 int factorial[10]; 8 9 factorial[0]=1; 10 #pragma omp parallel for 11 for (i=1; i < 10; i++) { 12 13 factorial[i] = i * factorial[i-1]; 14 } 15 16 return 0; 17 } omp.c(13): warning #12246: flow data dependence from (file:omp.c line:13) to (file:omp.c line:13), due to "factorial" may lead to incorrect program execution in parallel mode |
To enable parallel lint analysis, pass the /Qdiag-enable:sc-parallel[n] (Windows), -diag-enable sc-parallel[n] (Linux and Mac OS) option to the compiler.
Parallel lint is available for IA-32 and Intel® 64 architectures only.
Parallel lint requires the OpenMP option, /Qopenmp (Windows) -openmp (Linux and Mac OS). This option forces the compiler to process OpenMP pragmas to make parallelization specifics available for parallel lint analysis. If parallel lint is used without OpenMP, the compiler issues the following error message:
command line error: parallel lint not called due to lack of OpenMP
parallelization option, please add option /Qopenmp when using parallel lint.
If you are using Microsoft Visual Studio*, you should create a separate build configuration devoted to parallel lint, since object and library files produced by parallel lint should not be used to build your product.
Parallel lint provides a broad set of OpenMP checks which are useful both for beginners in parallel programming using OpenMP and for advanced parallel developers. See the Overview section of this manual.
The examples below highlight the most useful features of parallel lint.
An OpenMP program is much more difficult to debug if it has nested parallel regions. Various restrictions apply to nested parallel constructs. Parallel lint can check nested parallel statements even if they are located in different files.
In the example below, a worksharing construct may not be closely nested inside a worksharing, critical, ordered, or master construct.
Example |
---|
1 #include <stdio.h> 2 #include <omp.h> 3 4 int fff(int ii) 5 { 6 int rez; 7 8 #pragma omp sections 9 { 10 rez = ii; 11 #pragma omp section 12 rez = ii+2; 13 } 14 return rez; 15 } 16 17 18 int 19 main(int n) 20 { 21 int i; 22 23 omp_set_num_threads(3); 24 25 #pragma omp parallel 26 #pragma omp for ordered 27 for(i=1; i<150; i=i+2) { 28 fff(i); 29 #pragma omp ordered 30 if(i < 50 || i > 52) { 31 printf("i=%d NU=%d \n", i, omp_get_thread_num() ); 32 } 33 } 34 return 0; 35 } omp.c(8): error #12200: sections pragma is not allowed in the dynamic extent of loop pragma (file:omp.c line:26) |
Parallelization of an existing serial application requires accurate placement of data sharing clauses. Parallel lint can help determine not only improper usage of sharing clauses but also lack of proper data sharing pragmas.
The example below demonstrates the OpenMP standard restriction: "If the lastprivate clause is used on a construct to which nowait is also applied, then the original list item remains undefined until a barrier synchronization has been performed to ensure that the thread that executed the sequentially last iteration, or the lexically last SECTION construct, has stored that list item." [OpenMP standard]
Example |
---|
1 #include <stdio.h> 2 #include <omp.h> 3 4 int main(void) { 5 int last, i; 6 float a[10], b[10]; 7 8 for (i=0; i < 10; i++) { 9 b[i] = i*0.5; 10 } 11 12 #pragma omp parallel shared(a,b,last) 13 { 14 #pragma omp for lastprivate(last) nowait 15 for (i=0; i < 10; i++) { 16 a[i] = b[i] * 2; 17 last = a[i]; 18 } 19 #pragma omp single 20 printf("%d\n", last); 21 } 22 23 return 0; 24 } omp.c(20): error #12220: lastprivate variable "last" in nowait work-sharing construct is used before barrier synchronization |
Data dependency issues are very difficult to debug in parallel programs due to non-deterministic behavior. Parallel lint is able to determine data dependency issues in programs without executing them.
To turn on data dependency analysis you should specify severity level 3 parallel lint in diagnostics.
Example |
---|
1 #include <stdio.h> 2 #include <omp.h> 3 4 int main(void) 5 { 6 int i; 7 float a[100]; 8 9 #pragma omp parallel for 10 for (i=0; i < 100; i++) { 11 a[i] = i*0.66; 12 } 13 14 #pragma omp parallel for 15 for (i=1; i < 100; i++) { 16 a[i] = a[i-1]*0.5 + a[i]*0.5; 17 } 18 19 return 0; 20 } omp.c(16): warning #12246: flow data dependence from (file:omp.c line:16) to (file:omp.c line:16), due to "a" may lead to incorrect program execution in parallel mode |
Example |
---|
1 #include <stdio.h> 2 #include <omp.h> 3 4 int a[1000]; 5 #pragma omp threadprivate (a) 6 7 int main(int n) { 8 int i; 9 int sum =0; 10 11 #pragma omp parallel for 12 for (i=0; i < 1000; i++) { 13 a[i] = i; 14 } 15 #pragma omp parallel for reduction (+:sum) 16 for (i=10; i < 1000; i++) { // inconsistent init value 17 sum = sum + a[i]; 18 } 19 printf("%d\n",sum); 20 return 0; 21 } omp.cpp(17): error #12344: threadprivate variable "a" is used in loops with different initial values. See loops (file:omp.cpp line:12) and (file:omp.cpp line:16). |