javascript - Riak Map Reduce in JS returning limited data -
so have riak running on 2 ec2 servers, using python run javascript mapreduce. have been clustered. used "proof of concept".
there 50 keys in bucket, map/reduce function re-format data. testing map/reduce functionality in riak.
problem: output shows [{u'e': 2, u'undefined': 2, u'w': 2}]. wrong. logs show keys have "processed" 2 returned. question why happening , missing important.
code:
import riak client = riak.riakclient() query = riak.riakmapreduce(client).add('raw_hits10') query.map("""function(v) { var data = json.parse(v.values[0].data); return [[data, 1]]; }""") query.reduce("""function(vk) { var res = {}; (var indx in vk) { var key_t = vk[indx][0]; var val_t = vk[indx][1]; ejslog('/tmp/map_reduce.log', key_t + "--- " + val_t); res[key_t] = 2; } return [res] } """) res in query.run(): print res the results printing:
[{u'e': 2, u'undefined': 2, u'w': 2}] this makes no sense
in order avoid having load data preceding phase memory on coordinating node before running reduce phase (which problematic large mapreduce jobs), reduce function run multiple times. every iteration gets batch of results preceding phase output earlier reduce phase iteration(s). default batch size 20, configurable. results 1 reduce phase iteration fed in input next iteration, reduce phase functions need designed handle this, and strategies described here.
it possible force riak run reduce phase once entire input set specifying 'reduce_phase_only_1' parameter, not recommended, large jobs.
Comments
Post a Comment