linux - How do I know if I can compile with FMA instruction sets? -
i have seen questions how use fma instructions set before start using them, i'd first know if can (does processor support them). found post saying needed @ output of (working on linux):
more /proc/cpuinfo
to find out. this:
processor : 0 vendor_id : genuineintel cpu family : 6 model : 30 model name : intel(r) xeon(r) cpu x3470 @ 2.93ghz stepping : 5 cpu mhz : 2933.235 size : 8192 kb physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid bogomips : 5866.47 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual
what seems interesting flags part not sure how find out list if processor supports these instructions.
does know how find out? thank you.
this old question, still relevant one.
i assume want detect in c/c++ @ compile-time.
fp_fast_fma
macro not reliable way detect fma instruction set. macro defined in "math.h"
/<cmath>
if std::fma
faster x*y+z
, possible if it's intrinsic function based on fma instruction set. otherwise use non-intrinsic function slow. in 2016 gcc's default glibc/libstdc++ defines macro, other standard library implementations don't (including llvm libc++, icc's , msvc's). doesn't mean don't implement std::fma
intrinsic if possible, forgot define macro.
reliable fma detection
to reliably detect fma (or instruction set) @ compile time need use instruction set specific macros. these macros defined compiler based on selected target architecture and/or instruction sets.
there __fma__
macro fma/fma3 support, , __fma4__
macro amd fma4 support. gcc, clang , icc define them.
unfortunately msvc doesn't define instruction set specific macros other __avx__
, __avx2__
.
cross-compiler fma detection
for intel processors fma introduced avx2 intel haswell.
for amd processors, thing little bit messy. fma4 introduced avx , xop amd bulldozer. fma3 (intel fma equivalent) introduced amd piledriver. can distinguish piledriver predecessor bulldozer @ compile time presence of fma (__fma__
macro) , bmi (__bmi__
macro) instruction sets. unfortunately msvc doesn't define neither.
nevertheless, intel processors, amd processors support fma/fma3 if avx2 present.
if want cross-compiler detection whether target architecture supports fma/fma3, must detect __avx2__
macro, since defined major compilers (including msvc) if avx2 enabled:
#if !defined(__fma__) && defined(__avx2__) #define __fma__ 1 #endif
unfortunately there no reliable way detect amd fma4 using __avx__
, __avx2__
macros.
notes
fma instructions available in program if it's enabled compiler. in gcc , clang need set proper target architecture (like -march=haswell
) or manually enable fma instruction set -mfma
flag. icc enables fma automatically -xavx2
flag. msvc enables fma automatically /arch:avx2
option.
amd announced drop support of fma4 in future.
Comments
Post a Comment