count top words for each author in mapreduce framework -


i have collection of files, each file contains author's name , words used. trying write map-reduce code count each author's top n words. tricky part file may contains multiple authors. how should map-reduce framework designed ? pseudo code plus little explanation enough. thanks

in 1 mr job count words used each author creating complex key of author+word , value count.

a second mr job read pairs (author+word,count) , map them (author+count,word+count). write comparator order keys first author , count (largest smallest) , grouper treat 2 keys same author being in same reduce group, regardless of count. you'll need partitioner make sure pairs author go same partition. reducer called once each author , values (word+count) provided iterable largest count first. in reducer write author, word , count first n records iterable.


Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -