Load Operations for Streaming SIMD Extensions

The prototypes for Streaming SIMD Extensions (SSE) intrinsics for load operations are in the xmmintrin.h header file.

The results of each intrinsic operation are placed in a register. This register is illustrated for each intrinsic with R0-R3. R0, R1, R2 and R3 each represent one of the 4 32-bit pieces of the result register.

Intrinsic Name

Operation

Corresponding
SSE Instruction

_mm_loadh_pi

Load high

MOVHPS reg, mem

_mm_loadl_pi

Load low

MOVLPS reg, mem

_mm_load_ss

Load the low value and clear the three high values

MOVSS

_mm_load1_ps

Load one value into all four words

MOVSS + Shuffling

_mm_load_ps

Load four values, address aligned

MOVAPS

_mm_loadu_ps

Load four values, address unaligned

MOVUPS

_mm_loadr_ps

Load four values in reverse

MOVAPS + Shuffling

 

__m128 _mm_loadh_pi(__m128 a, __m64 const *p)

Sets the upper two SP FP values with 64 bits of data loaded from the address p.

R0

R1

R2

R3

a0

a1

*p0

*p1

 

__m128 _mm_loadl_pi(__m128 a, __m64 const *p)

Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a.

R0

R1

R2

R3

*p0

*p1

a2

a3

 

__m128 _mm_load_ss(float * p )

Loads an SP FP value into the low word and clears the upper three words.

R0

R1

R2

R3

*p

0.0

0.0

0.0

 

__m128 _mm_load1_ps(float * p)

Loads a single SP FP value, copying it into all four words.

R0

R1

R2

R3

*p

*p

*p

*p

 

__m128 _mm_load_ps(float * p )

Loads four SP FP values. The address must be 16-byte-aligned.

R0

R1

R2

R3

p[0]

p[1]

p[2]

p[3]

 

__m128 _mm_loadu_ps(float * p)

Loads four SP FP values. The address need not be 16-byte-aligned.

R0

R1

R2

R3

p[0]

p[1]

p[2]

p[3]

 

__m128 _mm_loadr_ps(float * p)

Loads four SP FP values in reverse order. The address must be 16-byte-aligned.

R0

R1

R2

R3

p[3]

p[2]

p[1]

p[0]