linux - How do I know if I can compile with FMA instruction sets? -


i have seen questions how use fma instructions set before start using them, i'd first know if can (does processor support them). found post saying needed @ output of (working on linux):

more /proc/cpuinfo 

to find out. this:

processor       : 0                                                   vendor_id       : genuineintel                                        cpu family      : 6                                                   model           : 30                                                  model name      : intel(r) xeon(r) cpu           x3470  @ 2.93ghz     stepping        : 5                                                   cpu mhz         : 2933.235                                            size            : 8192 kb                                             physical id     : 0                                                   siblings        : 4                                                   core id         : 0                                                   cpu cores       : 4                                                   apicid          : 0                                                   initial apicid  : 0                                                   fpu             : yes                                                 fpu_exception   : yes                                                 cpuid level     : 11                                                  wp              : yes                                                 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni  dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dts tpr_shadow vnmi flexpriority ept vpid                                                                                                        bogomips        : 5866.47                                                                                                                                                                                                                    clflush size    : 64                                                                                                                                                                                                                         cache_alignment : 64                                                                                                                                                                                                                         address sizes   : 36 bits physical, 48 bits virtual      

what seems interesting flags part not sure how find out list if processor supports these instructions.

does know how find out? thank you.

this old question, still relevant one.

i assume want detect in c/c++ @ compile-time.

fp_fast_fma macro not reliable way detect fma instruction set. macro defined in "math.h"/<cmath> if std::fma faster x*y+z, possible if it's intrinsic function based on fma instruction set. otherwise use non-intrinsic function slow. in 2016 gcc's default glibc/libstdc++ defines macro, other standard library implementations don't (including llvm libc++, icc's , msvc's). doesn't mean don't implement std::fma intrinsic if possible, forgot define macro.

reliable fma detection

to reliably detect fma (or instruction set) @ compile time need use instruction set specific macros. these macros defined compiler based on selected target architecture and/or instruction sets.

there __fma__ macro fma/fma3 support, , __fma4__ macro amd fma4 support. gcc, clang , icc define them.

unfortunately msvc doesn't define instruction set specific macros other __avx__ , __avx2__.

cross-compiler fma detection

for intel processors fma introduced avx2 intel haswell.

for amd processors, thing little bit messy. fma4 introduced avx , xop amd bulldozer. fma3 (intel fma equivalent) introduced amd piledriver. can distinguish piledriver predecessor bulldozer @ compile time presence of fma (__fma__ macro) , bmi (__bmi__ macro) instruction sets. unfortunately msvc doesn't define neither.

nevertheless, intel processors, amd processors support fma/fma3 if avx2 present.

if want cross-compiler detection whether target architecture supports fma/fma3, must detect __avx2__ macro, since defined major compilers (including msvc) if avx2 enabled:

#if !defined(__fma__) && defined(__avx2__)     #define __fma__ 1 #endif 

unfortunately there no reliable way detect amd fma4 using __avx__ , __avx2__ macros.

notes

fma instructions available in program if it's enabled compiler. in gcc , clang need set proper target architecture (like -march=haswell) or manually enable fma instruction set -mfma flag. icc enables fma automatically -xavx2 flag. msvc enables fma automatically /arch:avx2 option.

amd announced drop support of fma4 in future.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -