xml - Functions and subselects in single FLWOR -
i'm writing xquery analyse large numbers of xml files store queries similar example below. these queries i'd calculate averages, sums , other information on various subelements. additionally i'd generate subsections of queries in same document, instance queries have no hits.
as i'll manipulating hundreds of thousands of xml files, i'd make xquery efficient possible. i've tried use single for
iteration across documents cannot figure out how derive information need.
here's sample xml:
<query> <querystring>gigabyte sapphire gtx-860</querystring> <statuscode>0</statuscode> <querytime>0.04669069110297385</querytime> <hits>8</hits> <date>2013-05-02</date> <time>12:07:07</time> <lastmodified>12:07:07</lastmodified> <pages resultsperpage="10" clickcount="2"> <page resultcount="8" visited="true"> <result index="1" clickindex="0" timeviewed="0" pid="85405" title="ddr3 1024 mb" /> <result index="2" clickindex="1" timeviewed="178" pid="54065" title="atk excellium	" /> <result index="3" clickindex="0" timeviewed="0" pid="74902" title="intel e9650" /> <result index="4" clickindex="0" timeviewed="0" pid="56468" title="asus radeon hd 7980" /> <result index="5" clickindex="0" timeviewed="0" pid="31072" title="intel e7500" /> <result index="6" clickindex="0" timeviewed="0" pid="26620" title="ddr3 2048 mb" /> <result index="7" clickindex="2" timeviewed="92" pid="55625" title="gigabyte sapphire 7770" /> <result index="8" clickindex="0" timeviewed="0" pid="67701" title="intel e9650" /> </page> </pages> </query>
here's xquery:
let $doc := collection('file:///c:/rep/xml/input?select=*.xml') $y in ( <queries> { $x in $doc let $hits := $x/query/hits return <query hits="{$hits}" >{$x/query/querystring/string()}</query> } </queries> ) let $avghits := avg(data($y/query/@hits)) let $numqueries := count($y/*) return <statistics avghits="{$avghits}" numqueries="{$numqueries}"/>
which correctly returns <statistics numqueries="10" avghits="19.7"/>
sample of 10 xml files. right approach? seem need double can group queries disjoint files can't seem run functions on them otherwise.
i need repeat queries inside created <statistics>
element. need repeat flwor statement? can't bring summed or averaged values outside statement calculates them yet can't calculate them and perform subselect since i'll have include filter them.
(update)this query i've come include subsections of queries, mentioned i'm worried performance.
let $doc := collection('file:///c:/rep/xml/input?select=*.xml') $y in ( <queries> { $x in $doc let $hits := $x/query/hits return <query hits="{$hits}" >{$x/query/querystring/string()}</query> } </queries> ) let $avghits := avg(data($y/query/@hits)) let $numqueries := count($y/*) return <statistics avghits="{$avghits}" numqueries="{$numqueries}"> { $x in $doc let $hits := $x/query/hits $x/query/hits < 10 return <query hits="{$hits}" >{$x/query/querystring/string()}</query> } </statistics>
will xquery processor optimise statements or access xml files every loops across them? first let
statement prevent this?
this kind of document i'm aiming generate:
<dailystats date="2013-04-15" > <daystats> <querycount>24644</querycount> <errors>0</errors> <emptysearches>643</emptysearches> <averagesearchtime>0.0213</averagesearchtime> <averagesearchesperhour>236</averagesearchesperhour> </daystats> <storedqueries> <failedsearches> <failedsearch time="23:33:34" query="blurey" searchtime="0.0524" /> </failedsearches> </storedqueries> </dailystats>
if worried performance should use xml database (if not so) improve performance indexing data. additionally, e.g. using basex , loading xml files database can access nodes using ```db:open("your-db")```` avoiding nested loops. additionally use database-specific indexes speed query. if have simple xquery precessor working on fs touch each xml file knows nothing data in each file.
apart that, xquery looks fine me. optimization, tried point out, heavily depends on processor/database using.
yeah, have run test, impossible real-time runtime, because heavily depends on data , query have. however, shouldn't hard swicht database later on, wouldn't worry it.
Comments
Post a Comment