java - Massive multiprogramming and read-only file access -
i trying create dictionary-based tagger running on hadoop cluster using pig. basically, does, each document (quite large text documents, few mbs) run each word in each sentence against dictionary read corresponding value.
there few hundred java programs (not threads) running in parallel, using dictionary file in read-only mode. idea load dictionary text , create map
query against it.
question: should prepared for? remotely logic want read file in multiprogramming environment or should first copy (relatively small) file each instance of program?
bufferedreader
should use while reading file?
there little structured documentation on multiprogramming (compared multithreading) bit afraid of running against wall doing so.
note: allowed answer way of thinking totally wrong if provide me better way ;-)
i think approach fine. should load dictionary distributedcache
memory, , checks memory-loaded dictionary (e.g., hashmap
).
Comments
Post a Comment