performance - Why is this GLSL shader so slow? -


i trying raytrace on grid in fragment shader. have written shader below (vertex shader draws screenquad).

#version 150  uniform mat4 minvproj, minvrot; uniform vec4 vcampos;  varying vec4 vposition;  int test(vec3 p) {     if (p.x > -4.0 && p.x < 4.0      && p.y > -4.0 && p.y < 4.0      && ((p.z < -4.0 && p.z > -8.0) || (p.z > 4.0 && p.z < 8.0)))         return 1;     return 0; }  void main(void) {     vec4 cout = vec4(0, 0, 0, 0);      vec4 vworldspace = minvrot * minvproj * vposition;     vec3 vrayorg = vcampos.xyz;     vec3 vraydir = normalize(vworldspace.xyz);      // http://en.wikipedia.org/wiki/xiaolin_wu%27s_line_algorithm     vec3 adelta = abs(vraydir);     int increaser;     vec3 gradient, sgradient;     if (adelta.x > adelta.y && adelta.x > adelta.z)     {         increaser = 0;         gradient = vec3(vraydir.x > 0.0? 1.0: -1.0, vraydir.y / vraydir.x, vraydir.z / vraydir.x);         sgradient = vec3(0.0, gradient.y > 0.0? 1.0: -1.0, gradient.z > 0.0? 1.0: -1.0);     }     else if (adelta.y > adelta.x && adelta.y > adelta.z)      {         increaser = 1;         gradient = vec3(vraydir.x / vraydir.y, vraydir.y > 0.0? 1.0: -1.0, vraydir.z / vraydir.y);         sgradient = vec3(gradient.x > 0.0? 1.0: -1.0, 0.0, gradient.z > 0.0? 1.0: -1.0);     }     else      {         increaser = 2;         gradient = vec3(vraydir.x / vraydir.z, vraydir.y / vraydir.z, vraydir.z > 0.0? 1.0: -1.0);         sgradient = vec3(gradient.x > 0.0? 1.0: -1.0, gradient.y > 0.0? 1.0: -1.0, 0.0);     }     vec3 walk = vrayorg;     (int = 0; < 64; ++i)     {         vec3 fwalk = floor(walk);         if (test(fwalk) > 0)         {             vec3 c = abs(fwalk) / 4.0;             cout = vec4(c, 1.0);             break;         }         vec3 nextwalk = walk + gradient;         vec3 fnextwalk = floor(nextwalk);          bool xchanged = fnextwalk.x != fwalk.x;         bool ychanged = fnextwalk.y != fwalk.y;         bool zchanged = fnextwalk.z != fwalk.z;          if (increaser == 0)         {             if ((ychanged && test(fwalk + vec3(0.0, sgradient.y, 0.0)) > 0)              || (zchanged && test(fwalk + vec3(0.0, 0.0, sgradient.z)) > 0)              || (ychanged && zchanged && test(fwalk + vec3(0.0, sgradient.y, sgradient.z)) > 0))                 {                     vec3 c = abs(fwalk) / 4.0;                     cout = vec4(c, 1.0);                     break;                 }         }         else if (increaser == 1)         {             if ((xchanged && test(fwalk + vec3(sgradient.x, 0.0, 0.0)) > 0)              || (zchanged && test(fwalk + vec3(0.0, 0.0, sgradient.z)) > 0)              || (xchanged && zchanged && test(fwalk + vec3(sgradient.x, 0.0, sgradient.z)) > 0))                 {                     vec3 c = abs(fwalk) / 4.0;                     cout = vec4(c, 1.0);                     break;                 }         }         else         {             if ((xchanged && test(fwalk + vec3(sgradient.x, 0.0, 0.0)) > 0)              || (ychanged && test(fwalk + vec3(0.0, sgradient.y, 0.0)) > 0)              || (xchanged && ychanged && test(fwalk + vec3(sgradient.x, sgradient.y, 0.0)) > 0))                 {                     vec3 c = abs(fwalk) / 4.0;                     cout = vec4(c, 1.0);                     break;                 }         }          walk = nextwalk;     }      gl_fragcolor = cout; } 

as long looking @ close grid items, hardcoded ones, framerate looks acceptable (400+fps on geforce 680m) (although lower expect comparing other shaders have written far), when @ emptyness (so loop goes way 64), framerate terrible (40fps). around 1200 fps when looking close @ grid every pixel ends in same close grid item.

although understand doing loop every pixel work, still easy basic math, have removed texture-lookup , have used simple test, don't understand why has slow down hard. gpu has 16 cores , runs @ 700+mhz. rendering @ 960x540, 518400 pixels. should able handle more think.

if remove antialiasing part of above (the part of code test adjacent points based on increaser value), little better (100fps), come on, these calculations, shouldn't make difference! if split code increaser not used below code done every different part, framerate stays same. if change ints floats, nothing changes.

i have done more intensive and/or complicated shaders before, why 1 terribly slow? can tell calculation makes go slow?

i not setting uniforms not used or that, c-code doing nothing more rendering. code have used 100s of times before.

anyone?

the short answer is: branching , looping in shaders (can be) evil. it's more that: read topic further information: efficiency of branching in shaders

it comes this:

a graphics adapter has 1 or more gpu's, , gpu has several cores. every core designed run multiple threads threads can run exact same code (depending on implementation).

so if 10 threads have different loop, 10 threads have run long largest loop take run (depending on implementation, loop might continued further necessary or thread might stalling).

the same branches: if thread has if, may possible (depending on implementation) both branches executed , result of 1 of them used.

so, in conclusion, might (and is) better more math , use 0-factors if want calculations removed depending on conditions, writing condition , branching.

for example:

(using uselighting = 0.0f or 1.0f) return uselighting * clightcolor * cmaterialcolor + (1.0 - uselighting) * cmaterialcolor; 

might better than:

if (uselighting < 0.5)   return cmaterialcolor; else   return clightcolor * cmaterialcolor; 

but might not... performace-testing key...


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

javascript - Clean way to programmatically use CSS transitions from JS? -

android - send complex objects as post php java -