c++ - What implementation of average is the most accurate? -
given 2 implementations of average function:
float average(const vector<float>& seq) { float sum = 0.0f; (auto&& value : seq) { sum += value; } return sum / seq.size(); } and:
float average(const vector<float>& seq) { float avg = 0.0f; (auto&& value : seq) { avg += value / seq.size(); } return avg; } to illustrate question, imagine have huge difference in input data, so:
1.0f, 0.0f, 0.0f, 0.0f, 1000000.0f my guess in first implementation, sum can grow "too much" , loose least significant digits , 1000000.0f instead of 1000001.0f @ end of sum loop.
on other hand, second implementation seems theorically less efficient, due number of divisions perform (i haven't profiled anything, blind guess).
so, 1 of these implementation preferable other ? true first implementation less accurate ?
i wouldn't count on second being more accurate. differences in size of elements divided length of vector, each division introduces additional imprecision.
if accuracy problem, first step should use double. if vector float, memory reasons, calculations within function should double.
beyond that, large numbers of elements, should use kahan algorithm, rather naïvely adding elements. although adds number of operations in loop, keeps track of error, , result in more accuracy.
edit:
just fun of it, wrote small program used following code generate vector:
std::vector<float> v; v.push_back( 10000000.0f ); ( int count = 10000000; count > 0; -- count ) { v.push_back( 0.1f ); } the results of average should 1.0999999 (practically speaking, 1.1). using either of algorithms in original posting, results 0.999999881: error of 10%. changing sum have type double in first algorithm, however, results in 1.0999999, accurate can get. using kahan algorithm (with float everywhere) gives same results.
Comments
Post a Comment