sql - Efficiency of Calculating with Different Data Types? -
i have 1.2 billion row data set involves columns of high precision floating point/decimal numbers. requirements have accuracy out 12 decimal places. not need compare perfect equality , not need prevent weird numeric artifacts (like 7 coming out 7.000000000000001).
so, project use either float or decimal(18,15)... or maybe decimal(15,12) if there reason to. calculations need data involve abs(), avg(), addition, subtraction, multiplication, , division. possibly other statistical functions.
which, if any, of data type options efficient task?
edit: here test code trying use answer this. keep getting arithmatic overflow error. not sure how avoid without mucking test changing data types. see can cast, without introducing data types invalidate test? code follows:
if object_id('tempdb..#tempfloat') not null drop table #tempfloat if object_id('tempdb..#tempdecimal') not null drop table #tempdecimal if object_id('tempdb..#tempbinary') not null drop table #tempbinary select cast(rand() float) randa, cast(0 float) calca #tempfloat select cast(rand() decimal(18,15)) randa, cast(0 decimal(18,15)) calca #tempdecimal select cast(rand() binary(8)) randa, cast(0 binary(8)) calca #tempbinary insert #tempfloat ( randa, calca ) ( select cast(rand() float) randa, cast(0 float) calca ) insert #tempdecimal ( randa, calca ) ( select cast(rand() float) randa, cast(0 float) calca ) insert #tempbinary ( randa, calca ) ( select cast(rand() float) randa, cast(0 float) calca ) go -- 9999 update #tempfloat set calca = (abs((randa/2) - 1) * 10000) + (randa - (randa * 2)) update #tempdecimal set calca = (abs((randa/2) - 1) * 10000) + (randa - (randa * 2)) update #tempbinary set calca = (abs((randa/2) - 1) * 10000) + (randa - (randa * 2))
thanks in advance.
as rule of thumb, expect faster calculations doubles numerics or decimals. arbitrary-precision math expensive. floating-point math on modern computers relatively cheap. (am 1 remembers having buy floating-point processor speed spreadsheet calculations?)
but that's rule of thumb. if have lots of rows, lots of columns, , relatively few digits in numeric or decimal types, might find disk i/o bottleneck. (postgresql supports 1000 digits of precision in numeric data types. i'd call relatively many digits.)
the best thing can company
- build test table that's identical in structure production table, ,
- load enough rows of random data won't fit ram.
then run , time set of queries test performance.
use test table that's identical production, because having lot of columns exercises disk subsystem differently having few columns. use lot of random data, because want try approximate behavior of troublesome table. (testing narrow tables, you're doing in test code, won't sufficiently stress system.)
if table financially significant, it's worth reading dbms's documentation create table statement. you'll find several options can affect performance. oracle's create table statement documentation takes 50 pages print on paper. (50, not typo.)
Comments
Post a Comment