Flipping sign on packed SSE floats

Question!

I'm looking for the most efficient method of flipping the sign on all four floats packed in an SSE register.

I have not found an intrinsic for doing this in the Intel Architecture software dev manual. Below are the things I've already tried.

For each case I looped over the code 10 billion times and got the wall-time indicated. I'm trying to at least match 4 seconds it takes my non-SIMD approach, which is using just the unary minus operator.


[48 sec]
_mm_sub_ps( _mm_setzero_ps(), vec );


[32 sec]
_mm_mul_ps( _mm_set1_ps( -1.0f ), vec );


[9 sec]

union NegativeMask {
    int   intRep;
    float fltRep;
} negMask;
negMask.intRep = 0x80000000;

_mm_xor_ps( _mm_set1_ps( negMask.fltRep ), vec );


The compiler is gcc 4.2 with -O3. The CPU is an Intel Core 2 Duo.

By : nsanders


Answers

A life lesson about coding till 3am in the morning.....

I never tried just using the unary minus on my packed vector. That actually compiles and has the exact same performance as the non-SIMD approach.

By : nsanders


Just to complete your own answer by the gcc documentation about these builtin vectors:

The types defined in this manner can be used with a subset of normal C
operations.  Currently, GCC will allow using the following operators on
these types: ` , -, *, /, unary minus, ^, |, 


That union is not really needed, best of all worlds (readability, speed and portability):

_mm_xor_ps(vec, _mm_set1_ps(-0.f))
By : LiraNuna


This video can help you solving your question :)
By: admin