efficient
Improving Run-time Efficiency
Improving I/O Performance
auto-parallelizer
compilation
Efficient Compilation
Optimizing Compilation Process Overview
implied-DO loop collapsing
inlining
parallelizer
PGO options
use of arrays
use of record buffers
END PARALLEL DO
using
endian data
and OpenMP* extension routines
auto-parallelization
denormal
dumping profile information
for auto-parallelization
for little endian conversion
for profile-guided optimization
FORT_BUFFERED
loop constructs
OMP_NUM_THREADS
OMP_SCHEDULE
OpenMP*
parallel program development
PROF_DIR
PROF_DUMP_INTERVAL
routines overriding
using OpenMP*
using profile-guided optimization
vectorization
enhancing optimization
enhancing performance
EQUIVALENCE
effect on run-time efficiency
exclude code
code coverage tool
execution environment routines
execution flow
execution mode
explicit-shape arrays
.dpi
Basic PGO Options
Profmerge and Proforder Utilities
Code-coverage Tool
Test-prioritization Tool
.dyn
Basic PGO Options
Dumping and Resetting Profile Information
Dumping Profile Information
PGO Environment Variables
Profmerge and Proforder Utilities
Code-coverage Tool
Test-prioritization Tool
Profile an Application
.spi
Code-coverage Tool
Test-prioritization Tool
formatted
OpenMP* header
optimizing
pgopti.dpi
pgopti.spi
source
Efficient Compilation
Example of Profile-Guided Optimization
unformatted
FIRSTPRIVATE
in worksharing constructs
summary of data scope attribute clauses
using
flow dependency in loops
flush-to-zero mode
formatted files
FORT_BUFFERED environment variable
FTZ mode
function expansion
function grouping optimization
function order list
enabling or disabling
function order lists
function ordering optimization
function preemption
general compiler directives
Prefetching Support
Loop Unrolling Support
affecting data prefetches
Loop Count and Loop Distribution
Prefetching Support
affecting software pipelining
Loop Count and Loop Distribution
Pipelining for IA-64 Architecture
for auto-parallelization
for IA-32 architecture
for improving run-time efficiency
for inlining functions
for profile-guided optimization
for vectorization
Vectorization Overview
Key Programming Guidelines for Vectorization
instrumented code
processor-specific code
profile-optimized executable
profiling information
reports
high performance
high performance programming
applications for
dispatch options for
guidelines for
improving performance
list
options for
parsing
performance
processors for
Parallelism Overview
Automatic Processor-specific Optimization (IA-32 Architecture)
report generation
high-level optimization
high-level optimizer
HLO Overview
Optimizer Report Generation
HLO
HLO Overview
High-Level Optimization (HLO) Report
reports
hotspots
Hyper-Threading Technology
parallel loops
thread pools
IA-32 architecture based applications
HLO
methods of parallelization
options
Targeting IA-32 and Intel(R) 64 Architecture Processors Automatically
Targeting Multiple IA-32 and Intel 64 Architecture Processors for Run-time Performance
targeting
Targeting IA-32 and Intel(R) 64 Architecture Processors Automatically
Targeting Multiple IA-32 and Intel 64 Architecture Processors for Run-time Performance
using intrinsics in
IA-64 architecture based applications
auto-vectorization in
HLO
methods of parallelization
options
pipelining for
report generation
targeting
using intrinsics in
ILO
implied-DO loop
improving
code
I/O performance
run-time performance
initialization values for reduction variables
inlining
Efficient Compilation
Controlling Inline Expansion of User Functions
Improving Run-time Efficiency
Profile-guided Optimizations Overview
User Directed Inline Expansion of User Functions
Inline Function Expansion
compiler directed
developer directed
preemption
instruction-level parallelism
instrumentation
compilation
execution
feedback compilation
generating
preventing aliasing
program
Intel(R) 64 architecture based applications
HLO
methods of parallelization
options
Targeting IA-32 and Intel(R) 64 Architecture Processors Automatically
Targeting Multiple IA-32 and Intel 64 Architecture Processors for Run-time Performance
targeting
Targeting IA-32 and Intel(R) 64 Architecture Processors Automatically
Targeting Multiple IA-32 and Intel 64 Architecture Processors for Run-time Performance
using intrinsics in
Intel(R) architectures
Intel(R) compatibility libraries for OpenMP*
Intel(R) compiler-generated code
Intel(R) extension environment variables
Intel(R) extension routines
Intel(R) linking tools
Intel(R)-extended intrinsics
INTEL_PROF_DUMP_CUMULATIVE environment variable
INTEL_PROF_DUMP_INTERVAL environment variable
intermediate language scalar optimizer
intermediate representation (IR)
Using IPO
Interprocedural Optimization (IPO) Overview
intermediate results
using memory for
internal subprograms
interprocedural optimizations
Efficient Compilation
Controlling Inline Expansion of User Functions
Profile-guided Optimizations Overview
Optimizer Report Generation
capturing intermediate output
code layout
compilation
compiling
considerations
creating libraries
initiating
issues
large programs
linking
Interprocedural Optimization (IPO) Overview
Using IPO
options
overview
performance
reports
using
whole program analysis
xiar
xild
xilibtool
intrinsics
introduction to Optimizing Applications
IR
Using IPO
Interprocedural Optimization (IPO) Overview
IVDEP
effect of compiler option on
effect when tuning applications
IVDEP directive