It is based on the SSE version which is much more accurate, though very slightly slower.
This changeset also includes the following required changes:
- add packet-float to packet-int type traits
- add packet float<->int reinterpret casts
- add faster pselect for AVX based on blendv